Re: [zfs-discuss] slog tests on read throughput exhaustion (NFS)

2007-11-17 Thread Joe Little
On Nov 16, 2007 10:41 PM, Neil Perrin [EMAIL PROTECTED] wrote:


 Joe Little wrote:
  On Nov 16, 2007 9:13 PM, Neil Perrin [EMAIL PROTECTED] wrote:
  Joe,
 
  I don't think adding a slog helped in this case. In fact I
  believe it made performance worse. Previously the ZIL would be
  spread out over all devices but now all synchronous traffic
  is directed at one device (and everything is synchronous in NFS).
  Mind you 15MB/s seems a bit on the slow side - especially is
  cache flushing is disabled.
 
  It would be interesting to see what all the threads are waiting
  on. I think the problem maybe that everything is backed
  up waiting to start a transaction because the txg train is
  slow due to NFS requiring the ZIL to push everything synchronously.
 
 
  I agree completely. The log (even though slow) was an attempt to
  isolate writes away from the pool. I guess the question is how to
  provide for async access for NFS. We may have 16, 32 or whatever
  threads, but if a single writer keeps the ZIL pegged and prohibiting
  reads, its all for nought. Is there anyway to tune/configure the
  ZFS/NFS combination to balance reads/writes to not starve one for the
  other. Its either feast or famine or so tests have shown.

 No there's no way currently to give reads preference over writes.
 All transactions get equal priority to enter a transaction group.
 Three txgs can be outstanding as we use a 3 phase commit model:
 open; quiescing; and syncing.


anyway to improve the balance? Is would appear that zil_disable is
still a requirement to get NFS to behave in an practical real world
way with ZFS still. Even with zil_disable, we end up with periods of
pausing on the heaviest of writes, and then I think its mostly just
ZFS having too much outstanding i/o to commit.

If zil_disable is enabled, is the slog disk ignored?

 Neil.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool io to 6140 is really slow

2007-11-17 Thread Asif Iqbal
On Nov 17, 2007 9:12 AM, Louwtjie Burger [EMAIL PROTECTED] wrote:
 You have a 6140 with SAS drives ?! When did this happen?

OOPS! I meant FC-AL




 On Nov 17, 2007 12:30 AM, Asif Iqbal [EMAIL PROTECTED] wrote:
  I have the following layout
 
  A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
  A1 anfd B1 controller port 4Gbps speed.
  Each controller has 2G NVRAM
 
  On 6140s I setup raid0 lun per SAS disks with 16K segment size.
 
  On 490 I created a zpool with 8 4+1 raidz1s
 
  I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
  /etc/system
 
  Is there a way I can improve the performance. I like to get 1GB/sec IO.
 
  Currently each lun is setup as primary A1 and secondary B1 or vice versa
 
  I also have write cache eanble according to CAM
 
  --
  Asif Iqbal
  PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu

  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 




-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool io to 6140 is really slow

2007-11-17 Thread Asif Iqbal
(Including storage-discuss)

I have 6 6140s with 96 disks. Out of which 64 of them are Seagate
ST337FC (300GB - 1 RPM FC-AL)

I created 16k seg size raid0 luns using single fcal disks. Then
created a zpool with 8 4+1 raidz1 using those luns, out of single
disks. Also set the zfs nocache flush to `1' to
take advantage of the 2G NVRAM cache of the controllers.

I am using one port per controller. Rest of them are down (not in
use). Each controller port
speed is 4Gbps.

All luns have one controller as primary and second one as secondary

I am getting only 125MB/s according to the zpool IO.

I should get ~ 512MB/s per IO.

Also is it possible to get 2GB/s IO by using the leftover ports of the
controllers?

Is it also possible to get 4GB/s IO by aggregating the controllers (w/
8 ports totat)?



On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote:
 I have the following layout

 A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
 A1 anfd B1 controller port 4Gbps speed.
 Each controller has 2G NVRAM

 On 6140s I setup raid0 lun per SAS disks with 16K segment size.

 On 490 I created a zpool with 8 4+1 raidz1s

 I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
 /etc/system

 Is there a way I can improve the performance. I like to get 1GB/sec IO.

 Currently each lun is setup as primary A1 and secondary B1 or vice versa

 I also have write cache eanble according to CAM

 --
 Asif Iqbal
 PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu




-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS

2007-11-17 Thread can you guess?
 
 I've been observing two threads on zfs-discuss with
 the following 
 Subject lines:
 
 Yager on ZFS
 ZFS + DB + fragments
 
 and have reached the rather obvious conclusion that
 the author can 
 you guess? is a professional spinmeister,

Ah - I see we have another incompetent psychic chiming in - and judging by his 
drivel below a technical incompetent as well.  While I really can't help him 
with the former area, I can at least try to educate him in the latter.

...

 Excerpt 1:  Is this premium technical BullShit (BS)
 or what?

Since you asked:  no, it's just clearly beyond your grade level, so I'll try to 
dumb it down enough for you to follow.

 
 - BS 301 'grad level technical BS'
 ---
 
 Still, it does drive up snapshot overhead, and if you
 start trying to 
 use snapshots to simulate 'continuous data
 protection' rather than 
 more sparingly the problem becomes more significant
 (because each 
 snapshot will catch any background defragmentation
 activity at a 
 different point, such that common parent blocks may
 appear in more 
 than one snapshot even if no child data has actually
 been updated). 
 Once you introduce CDP into the process (and it's
 tempting to, since 
 the file system is in a better position to handle it
 efficiently than 
 some add-on product), rethinking how one approaches
 snapshots (and COW 
 in general) starts to make more sense.

Do you by any chance not even know what 'continuous data protection' is?  It's 
considered a fairly desirable item these days and was the basis for several hot 
start-ups (some since gobbled up by bigger fish that apparently agreed that 
they were onto something significant), since it allows you to roll back the 
state of individual files or the system as a whole to *any* historical point 
you might want to (unlike snapshots, which require that you anticipate points 
you might want to roll back to and capture them explicitly - or take such 
frequent snapshots that you'll probably be able to get at least somewhere near 
any point you might want to, a second-class simulation of CDP which some 
vendors offer because it's the best they can do and is precisely the activity 
which I outlined above, expecting that anyone sufficiently familiar with file 
systems to be able to follow the discussion would be familiar with it).

But given your obvious limitations I guess I should spell it out in words of 
even fewer syllables:

1.  Simulating CDP without actually implementing it means taking very frequent 
snapshots.

2.  Taking very frequent snapshots means that you're likely to interrupt 
background defragmentation activity such that one child of a parent is moved 
*before* the snapshot is taken while another is moved *after* the snapshot is 
taken, resulting in the need to capture a before-image of the parent (because 
at least one of its pointers is about to change) *and all ancestors of the 
parent* (because the pointer change will propagate through all the ancestral 
checksums - and pointers, with COW) in every snapshot that occurs immediately 
prior to moving *any* of its children rather than just having to capture a 
single before-image of the parent and all its ancestors after which all its 
child pointers will likely get changed before the next snapshot is taken.

So that's what any competent reader should have been able to glean from the 
comments that stymied you.  The paragraph's concluding comments were 
considerably more general in nature and thus legitimately harder to follow:  
had you asked for clarification rather than just assumed that they were BS 
simply because you couldn't understand them you would not have looked like such 
an idiot, but since you did call them into question I'll now put a bit more 
flesh on them for those who may be able to follow a discussion at that level of 
detail:

3.  The file system is in a better position to handle CDP than some external 
mechanism because

a) the file system knows (right down to the byte level if it wants to) exactly 
what any individual update is changing,

b) the file system knows which updates are significant (e.g., there's probably 
no intrinsic need to capture rollback information for lazy writes because the 
application didn't care whether they were made persistent at that time, but for 
any explicitly-forced writes or syncs a rollback point should be established), 
and

c) the file system is already performing log forces (where a log is involved) 
or batch disk updates (a la ZFS) to honor such application-requested 
persistence, and can piggyback the required CDP before-image persistence on 
them rather than requiring separate synchronous log or disk accesses to do so.

4.  If you've got full-fledged CDP, it's questionable whether you need 
snapshots as well (unless you have really, really inflexible requirements for 
virtually instantaneous rollback and/or for high-performance writable-clone 
access) - and if CDP turns out to be this decade's important new file 

Re: [zfs-discuss] [storage-discuss] zpool io to 6140 is really slow

2007-11-17 Thread Asif Iqbal
On Nov 17, 2007 2:55 PM, Torrey McMahon [EMAIL PROTECTED] wrote:
 Have you tried disabling the zil cache flushing?

I already have zfs nocache flush set to 1 to take advantage of NVRAM
of the raid controllers

  set zfs:zfs_nocacheflush = 1



 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Cache_Flushes


 Asif Iqbal wrote:
  (Including storage-discuss)
 
  I have 6 6140s with 96 disks. Out of which 64 of them are Seagate
  ST337FC (300GB - 1 RPM FC-AL)
 
  I created 16k seg size raid0 luns using single fcal disks. Then
  created a zpool with 8 4+1 raidz1 using those luns, out of single
  disks. Also set the zfs nocache flush to `1' to
  take advantage of the 2G NVRAM cache of the controllers.
 
  I am using one port per controller. Rest of them are down (not in
  use). Each controller port
  speed is 4Gbps.
 
  All luns have one controller as primary and second one as secondary
 
  I am getting only 125MB/s according to the zpool IO.
 
  I should get ~ 512MB/s per IO.
 
  Also is it possible to get 2GB/s IO by using the leftover ports of the
  controllers?
 
  Is it also possible to get 4GB/s IO by aggregating the controllers (w/
  8 ports totat)?
 
 
 
  On Nov 16, 2007 5:30 PM, Asif Iqbal [EMAIL PROTECTED] wrote:
 
  I have the following layout
 
  A 490 with 8 1.8Ghz and 16G mem. 6 6140s with 2 FC controllers using
  A1 anfd B1 controller port 4Gbps speed.
  Each controller has 2G NVRAM
 
  On 6140s I setup raid0 lun per SAS disks with 16K segment size.
 
  On 490 I created a zpool with 8 4+1 raidz1s
 
  I am getting zpool IO of only 125MB/s with zfs:zfs_nocacheflush = 1 in
  /etc/system
 
  Is there a way I can improve the performance. I like to get 1GB/sec IO.
 
  Currently each lun is setup as primary A1 and secondary B1 or vice versa
 
  I also have write cache eanble according to CAM
 
  --
  Asif Iqbal
  PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
 
 
 
 
 
 





-- 
Asif Iqbal
PGP Key: 0xE62693C5 KeyServer: pgp.mit.edu
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS

2007-11-17 Thread Rich Teer
On Sat, 17 Nov 2007, can you guess? wrote:

 Ah - I see we have another incompetent psychic chiming in - and
 judging by his drivel below a technical incompetent as well.  While I
 really can't help him with the former area, I can at least try to
 educate him in the latter.

I should know better than to reply to a troll, but I can't let this
personal attack stand.  I know Al, and I can tell you for a fact that
he is *far* from technically incompentent.

Judging from the length of your diatribe (which I didn't bother reading),
you seem to subscribe to the if you can't blind 'em with science,
baffle them with bullshit school of thought.  I'd take the word of
any number of people on this list over yours, anyday.

HAND,

-- 
Rich Teer, SCSA, SCNA, SCSECA, OGB member

CEO,
My Online Home Inventory

URLs: http://www.rite-group.com/rich
  http://www.linkedin.com/in/richteer
  http://www.myonlinehomeinventory.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] pls discontinue troll bait was: Yager on ZFS and ZFS

2007-11-17 Thread Tim Spriggs
troll bait

Rich Teer wrote:
 I should know better than to reply to a troll, but I can't let this
 personal attack stand.  I know Al, and I can tell you for a fact that
 he is *far* from technically incompentent.

 Judging from the length of your diatribe (which I didn't bother reading),
 you seem to subscribe to the if you can't blind 'em with science,
 baffle them with bullshit school of thought.  I'd take the word of
 any number of people on this list over yours, anyday.

 HAND,
   

I'm sure this troll will reply to you as he did to me. I just can't help 
laughing at his responses anymore. I do find it odd that someone has so 
much time on their hands to just post such remarks. It's as if they 
think they are doing themself or the world a flavor.


/troll bait
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss