Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Tracey Bernath
On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone d...@geek.com.au wrote:

 On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
  Now, to add the second SSD ZIL/L2ARC for a mirror.

 Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
 add more devices and let them load-balance.   This is especially true
 if you're sharing ssd writes with ZIL, as slices on the same devices.

 Well, the problem I am trying to solve is wouldn't it read 2x faster with
the mirror?  It seems once I can drive the single device to 10 queued
actions, and 100% busy, it would be more useful to have two channels to the
same data. Is ZFS not smart enough to understand that there are two
identical mirror devices in the cache to split requests to? Or, are you
saying that ZFS is smart enough to cache it in two places, although not
mirrored?

If the device itself was full, and items were falling off the L2ARC, then I
could see having two separate cache devices, but since I am only at about
50% utilization of the available capacity, and maxing out the IO, then
mirroring seemed smarter.

Am I missing something here?

Tracey



  I may even splurge for one more to get a three way mirror.

 With more devices, questions about selecting different devices
 appropriate for each purpose come into play.

  Now I need a bigger server

 See? :)

 --
 Dan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-15 Thread Tracey Bernath
For those following the saga:
With the prefetch problem fixed, and data coming off the L2ARC instead of
the disks, the system switched from IO bound to CPU bound, I opened up the
throttles with some explicit PARALLEL hints in the Oracle commands, and we
were finally able to max out the single SSD:


r/s  w/s kr/s  kw/s wait actv wsvc_t asvc_t  %w  %b device

  826.03.2 104361.8   35.2  0.0  9.90.0   12.0   3 100 c0t0d4

So, when we maxed out the SSD cache, it was delivering 100+MB/s, and 830
IOPS
with 3.4 TB behind it in a 4 disk SATA RAIDz1.

Still have to remap it to 8k blocks to get more efficiency, but for raw
numbers, it's right what I was
looking for. Now, to add the second SSD ZIL/L2ARC for a mirror. I may even
splurge for one more to
get a three way mirror. That will completely saturate the SCSI channel. Now
I need a bigger server

Did I mention it was $1000 for the whole setup? Bah-ha-ha-ha.

Tracey


On Sat, Feb 13, 2010 at 11:51 PM, Tracey Bernath tbern...@ix.netcom.comwrote:

 OK, that was the magic  incantation I  was looking for:
 - changing the noprefetch option opened the floodgates to the L2ARC
 - changing the max queue depth relived the wait time on the drives,
 although I may undo this again in the benchmarking since these drives all
 have NCQ

 I went from all four disks of the array at 100%, doing about 170 read
 IOPS/25MB/s
 to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
 off the cache drive (@ only 50% load).
 This bodes well for adding a second mirrored cache drive to push for the
 1KIOPS.

 Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
 be ready
 for some production benchmarking.


 BEFORE:
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
   sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82

   sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
   sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
   sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
   sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31



 AFTER:
 extended device statistics
 r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
   285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4


 And, keep  in mind this was on less than $1000 of hardware.

 Thanks for the pointers guys,
 Tracey



 On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling 
 richard.ell...@gmail.comwrote:

 comment below...

 On Feb 12, 2010, at 2:25 PM, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
   pool: dpool
  state: ONLINE
  scrub: none requested
  config:
 
 NAMESTATE READ WRITE CKSUM
 dpool   ONLINE   0 0 0
   raidz1ONLINE   0 0 0
 c0t0d0  ONLINE   0 0 0
 c0t0d1  ONLINE   0 0 0
 c0t0d2  ONLINE   0 0 0
 c0t0d3  ONLINE   0 0 0
  [b]logs
   c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
   c0t0d4s1  ONLINE   0 0 0[/b]
 spares
   c0t0d6AVAIL
   c0t0d7AVAIL
 
capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
   raidz172.1G  3.55T237  9  29.7M   469K
 c0t0d0  -  -166  3  7.39M   157K
 c0t0d1  -  -166  3  7.44M   157K
 c0t0d2  -  -166  3  7.39M   157K
 c0t0d3  -  -167  3  7.45M   157K
   c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
  extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b

Re: [zfs-discuss] SSD and ZFS

2010-02-14 Thread Tracey Bernath
OK, that was the magic  incantation I  was looking for:
- changing the noprefetch option opened the floodgates to the L2ARC
- changing the max queue depth relived the wait time on the drives, although
I may undo this again in the benchmarking since these drives all have NCQ

I went from all four disks of the array at 100%, doing about 170 read
IOPS/25MB/s
to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
off the cache drive (@ only 50% load).
This bodes well for adding a second mirrored cache drive to push for the
1KIOPS.

Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
be ready
for some production benchmarking.


 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
  285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4


And, keep  in mind this was on less than $1000 of hardware.

Thanks,
Tracey


On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling richard.ell...@gmail.comwrote:

 comment below...

 On Feb 12, 2010, at 2:25 PM, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
   pool: dpool
  state: ONLINE
  scrub: none requested
  config:
 
 NAMESTATE READ WRITE CKSUM
 dpool   ONLINE   0 0 0
   raidz1ONLINE   0 0 0
 c0t0d0  ONLINE   0 0 0
 c0t0d1  ONLINE   0 0 0
 c0t0d2  ONLINE   0 0 0
 c0t0d3  ONLINE   0 0 0
  [b]logs
   c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
   c0t0d4s1  ONLINE   0 0 0[/b]
 spares
   c0t0d6AVAIL
   c0t0d7AVAIL
 
capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
   raidz172.1G  3.55T237  9  29.7M   469K
 c0t0d0  -  -166  3  7.39M   157K
 c0t0d1  -  -166  3  7.44M   157K
 c0t0d2  -  -166  3  7.39M   157K
 c0t0d3  -  -167  3  7.45M   157K
   c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
  extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

 iostat has a n option, which is very useful for looking at device names
 :-)

 The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
 response time will be agonizingly slow.

 By default, for this version of ZFS, up to 35 I/Os will be queued to the
 disk, which is why you see 35.0 in the actv column. The combination
 of actv=35 and svc_t200 indicates that this is the place to start working.
 Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
 This will reduce the concurrent load on the disks, thus reducing svc_t.

 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

  -- richard

  

Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Tracey Bernath
Thanks Brendan,
I was going to move it over to 8kb block size once I got through this index
rebuild. My thinking was that a disproportionate block size would show up as
excessive IO thruput, not a lack of thruput.

The question about the cache comes from the fact that the 18GB or so that it
says is in the cache IS the database. This was why I was thinking the index
rebuild should be CPU constrained, and I should see a spike in reading from
the cache.  If the entire file is cached, why would it go to the disks at
all for the reads?

The disks are delivering about 30MB/s of reads, but this SSD is rated for
sustained 70MB/s, so there should be a chance to pick up 100% gain.

I've seen lots of mention of kernel settings, but those only seem to apply
to cache flushes on sync writes.

Any idea on where to look next? I've spent about a week tinkering with
it.I'm trying to get a major customer to switch over to zfs and an open
storage solution, but I'm afraid if I cant get it to work in the small
scale, I cant convince them about the large scale.

Thanks,
Tracey


On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems 
bren...@sun.com wrote:

 On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
  I have a similar question, I put together a cheapo RAID with four 1TB WD
 Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
 slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
  # zpool status dpool
pool: dpool
   state: ONLINE
   scrub: none requested
  config:
 
  NAMESTATE READ WRITE CKSUM
  dpool   ONLINE   0 0 0
raidz1ONLINE   0 0 0
  c0t0d0  ONLINE   0 0 0
  c0t0d1  ONLINE   0 0 0
  c0t0d2  ONLINE   0 0 0
  c0t0d3  ONLINE   0 0 0
  [b]logs
c0t0d4s0  ONLINE   0 0 0[/b]
  [b]cache
c0t0d4s1  ONLINE   0 0 0[/b]
  spares
c0t0d6AVAIL
c0t0d7AVAIL
 
 capacity operationsbandwidth
  pool used  avail   read  write   read  write
  --  -  -  -  -  -  -
  dpool   72.1G  3.55T237 12  29.7M   597K
raidz172.1G  3.55T237  9  29.7M   469K
  c0t0d0  -  -166  3  7.39M   157K
  c0t0d1  -  -166  3  7.44M   157K
  c0t0d2  -  -166  3  7.39M   157K
  c0t0d3  -  -167  3  7.45M   157K
c0t0d4s020K  4.97G  0  3  0   127K
  cache   -  -  -  -  -  -
c0t0d4s1  17.6G  36.4G  3  1   249K   119K
  --  -  -  -  -  -  -
  I just don't seem to be getting any bang for the buck I should be.  This
 was taken while rebuilding an Oracle index, all files stored in this pool.
  The WD disks are at 100%, and nothing is coming from the cache.  The cache
 does have the entire DB cached (17.6G used), but hardly reads anything from
 it.  I also am not seeing the spike of data flowing into the ZIL either,
 although iostat show there is just write traffic hitting the SSD:
 
   extended device statistics  cpu
  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
 
  Since this SSD is in a RAID array, and just presents as a regular disk
 LUN, is there a special incantation required to turn on the Turbo mode?
 
  Doesnt it seem that all this traffic should be maxing out the SSD? Reads
 from the cache, and writes to the ZIL? I have a seocnd identical SSD I
 wanted to add as a mirror, but it seems pointless if there's no zip to be
 had

 The most likely reason is that this workload has been identified as
 streaming
 by ZFS, which is prefetching from disk instead of the L2ARC
 (l2arc_nopreftch=1).

 It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle
 doing
 128 Kbyte random I/O?  We usually tune that down before creating the
 database;
 which will use the L2ARC device more efficiently.

 Brendan

 --
 Brendan Gregg, Fishworks
 http://blogs.sun.com/brendan




-- 
Tracey Bernath
913-488-6284
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss