Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Fajar A. Nugraha
On Sun, Feb 14, 2010 at 12:51 PM, Tracey Bernath  wrote:
> I went from all four disks of the array at 100%, doing about 170 read
> IOPS/25MB/s
> to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
> off the cache drive (@ only 50% load).


> And, keepĀ  in mind this was on less than $1000 of hardware.

really? complete box and all, or is it just the disks? Cause the 4
disks alone should cost about $400. Did you use ECC RAM?

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Richard Elling
On Feb 16, 2010, at 12:39 PM, Daniel Carosone wrote:
> On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath wrote:
>> On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone  wrote:
>>> Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
>>> add more devices and let them load-balance.   This is especially true
>>> if you're sharing ssd writes with ZIL, as slices on the same devices.
>>> 
>>> Well, the problem I am trying to solve is wouldn't it read 2x faster with
>> the mirror?  It seems once I can drive the single device to 10 queued
>> actions, and 100% busy, it would be more useful to have two channels to the
>> same data. Is ZFS not smart enough to understand that there are two
>> identical mirror devices in the cache to split requests to? Or, are you
>> saying that ZFS is smart enough to cache it in two places, although not
>> mirrored?
> 
> First, Bob is right, measurement trumps speculation.  Try it.
> 
> As for speculation, you're thinking only about reads.  I expect
> reading from l2arc devices will be the same as reading from any other
> zfs mirror, and largely the same in both cases above; load balanced
> across either device.  In the rare case of a bad read from unmirrored
> l2arc, data will be fetched from the pool, so mirroring l2arc doesn't
> add any resiliency benefit.
> 
> However, your cache needs to be populated and maintained as well, and
> this needs writes.  Twice as many of them for the mirror as for the
> "stripe". Half of what is written never needs to be read again. These
> writes go to the same ssd devices you're using for ZIL, on commodity
> ssd's which are not well write-optimised, they may be hurting zil
> latency by making the ssd do more writing, stealing from the total
> iops count on the channel, and (as a lesser concern) adding wear
> cycles to the device.  

The L2ARC writes are throttled to be 8MB/sec, except during cold
start where the throttle is 16MB/sec.  This should not be noticeable
on the channels.

> When you're already maxing out the IO, eliminating wasted cycles opens
> your bottleneck, even if only a little. 

+1 
 -- richard

> Once you reach steady state, I don't know how much turnover in l2arc
> contents you will have, and therefore how many extra writes we're
> talking about.  It may not be many, but they are unnecessary ones.  
> 
> Normally, we'd talk about measuring a potential benefit, and then
> choosing based on the results.  In this case, if I were you I'd
> eliminate the unnecessary writes, and measure the difference more as a
> matter of curiosity and research, since I was already set up to do so.
> 
> --
> Dan.
> 

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
http://nexenta-atlanta.eventbrite.com (March 15-17, 2010)



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Daniel Carosone
On Mon, Feb 15, 2010 at 09:11:02PM -0600, Tracey Bernath wrote:
> On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone  wrote:
> > Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
> > add more devices and let them load-balance.   This is especially true
> > if you're sharing ssd writes with ZIL, as slices on the same devices.
> >
> > Well, the problem I am trying to solve is wouldn't it read 2x faster with
> the mirror?  It seems once I can drive the single device to 10 queued
> actions, and 100% busy, it would be more useful to have two channels to the
> same data. Is ZFS not smart enough to understand that there are two
> identical mirror devices in the cache to split requests to? Or, are you
> saying that ZFS is smart enough to cache it in two places, although not
> mirrored?

First, Bob is right, measurement trumps speculation.  Try it.

As for speculation, you're thinking only about reads.  I expect
reading from l2arc devices will be the same as reading from any other
zfs mirror, and largely the same in both cases above; load balanced
across either device.  In the rare case of a bad read from unmirrored
l2arc, data will be fetched from the pool, so mirroring l2arc doesn't
add any resiliency benefit.

However, your cache needs to be populated and maintained as well, and
this needs writes.  Twice as many of them for the mirror as for the
"stripe". Half of what is written never needs to be read again. These
writes go to the same ssd devices you're using for ZIL, on commodity
ssd's which are not well write-optimised, they may be hurting zil
latency by making the ssd do more writing, stealing from the total
iops count on the channel, and (as a lesser concern) adding wear
cycles to the device.  

When you're already maxing out the IO, eliminating wasted cycles opens
your bottleneck, even if only a little. 

Once you reach steady state, I don't know how much turnover in l2arc
contents you will have, and therefore how many extra writes we're
talking about.  It may not be many, but they are unnecessary ones.  

Normally, we'd talk about measuring a potential benefit, and then
choosing based on the results.  In this case, if I were you I'd
eliminate the unnecessary writes, and measure the difference more as a
matter of curiosity and research, since I was already set up to do so.

--
Dan.



pgpqCi6va8O6V.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Bob Friesenhahn

On Mon, 15 Feb 2010, Tracey Bernath wrote:


If the device itself was full, and items were falling off the L2ARC, then I 
could see having two
separate cache devices, but since I am only at about 50% utilization of the 
available capacity, and
maxing out the IO, then mirroring seemed smarter.

Am I missing something here?


I doubt it.  The only way to know for sure is to test it but it seems 
unlikely to me that zfs implementors would fail to load share the 
reads from mirrored L2ARC.  Richard's points about L2ARC bandwidth vs 
pool disk bandwidth are still good ones.  L2ARC is all about read 
latency, but L2ARC does not necessarily help with read bandwidth.  It 
is also useful to keep in mind that L2ARC offers at least 40x less 
bandwidth than ARC in RAM.  So always populate RAM first if you can 
afford it.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-16 Thread Tracey Bernath
On Mon, Feb 15, 2010 at 5:51 PM, Daniel Carosone  wrote:

> On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
> > Now, to add the second SSD ZIL/L2ARC for a mirror.
>
> Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
> add more devices and let them load-balance.   This is especially true
> if you're sharing ssd writes with ZIL, as slices on the same devices.
>
> Well, the problem I am trying to solve is wouldn't it read 2x faster with
the mirror?  It seems once I can drive the single device to 10 queued
actions, and 100% busy, it would be more useful to have two channels to the
same data. Is ZFS not smart enough to understand that there are two
identical mirror devices in the cache to split requests to? Or, are you
saying that ZFS is smart enough to cache it in two places, although not
mirrored?

If the device itself was full, and items were falling off the L2ARC, then I
could see having two separate cache devices, but since I am only at about
50% utilization of the available capacity, and maxing out the IO, then
mirroring seemed smarter.

Am I missing something here?

Tracey



> > I may even splurge for one more to get a three way mirror.
>
> With more devices, questions about selecting different devices
> appropriate for each purpose come into play.
>
> > Now I need a bigger server
>
> See? :)
>
> --
> Dan.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-15 Thread Daniel Carosone
On Sun, Feb 14, 2010 at 11:08:52PM -0600, Tracey Bernath wrote:
> Now, to add the second SSD ZIL/L2ARC for a mirror. 

Just be clear: mirror ZIL by all means, but don't mirror l2arc, just
add more devices and let them load-balance.   This is especially true
if you're sharing ssd writes with ZIL, as slices on the same devices.

> I may even splurge for one more to get a three way mirror.

With more devices, questions about selecting different devices
appropriate for each purpose come into play.

> Now I need a bigger server

See? :)

--
Dan.

pgpdQ822KAFIw.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-15 Thread Tracey Bernath
For those following the saga:
With the prefetch problem fixed, and data coming off the L2ARC instead of
the disks, the system switched from IO bound to CPU bound, I opened up the
throttles with some explicit PARALLEL hints in the Oracle commands, and we
were finally able to max out the single SSD:


r/s  w/s kr/s  kw/s wait actv wsvc_t asvc_t  %w  %b device

  826.03.2 104361.8   35.2  0.0  9.90.0   12.0   3 100 c0t0d4

So, when we maxed out the SSD cache, it was delivering 100+MB/s, and 830
IOPS
with 3.4 TB behind it in a 4 disk SATA RAIDz1.

Still have to remap it to 8k blocks to get more efficiency, but for raw
numbers, it's right what I was
looking for. Now, to add the second SSD ZIL/L2ARC for a mirror. I may even
splurge for one more to
get a three way mirror. That will completely saturate the SCSI channel. Now
I need a bigger server

Did I mention it was <$1000 for the whole setup? Bah-ha-ha-ha.

Tracey


On Sat, Feb 13, 2010 at 11:51 PM, Tracey Bernath wrote:

> OK, that was the magic  incantation I  was looking for:
> - changing the noprefetch option opened the floodgates to the L2ARC
> - changing the max queue depth relived the wait time on the drives,
> although I may undo this again in the benchmarking since these drives all
> have NCQ
>
> I went from all four disks of the array at 100%, doing about 170 read
> IOPS/25MB/s
> to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
> off the cache drive (@ only 50% load).
> This bodes well for adding a second mirrored cache drive to push for the
> 1KIOPS.
>
> Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
> be ready
> for some production benchmarking.
>
>
> BEFORE:
>  devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
>   sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82
>
>   sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
>   sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
>   sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
>   sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31
>


> AFTER:
> extended device statistics
> r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
> 0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
>   285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4
>
>
> And, keep  in mind this was on less than $1000 of hardware.
>
> Thanks for the pointers guys,
> Tracey
>
>
>
> On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling 
> wrote:
>
>> comment below...
>>
>> On Feb 12, 2010, at 2:25 PM, TMB wrote:
>> > I have a similar question, I put together a cheapo RAID with four 1TB WD
>> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
>> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
>> > # zpool status dpool
>> >  pool: dpool
>> > state: ONLINE
>> > scrub: none requested
>> > config:
>> >
>> >NAMESTATE READ WRITE CKSUM
>> >dpool   ONLINE   0 0 0
>> >  raidz1ONLINE   0 0 0
>> >c0t0d0  ONLINE   0 0 0
>> >c0t0d1  ONLINE   0 0 0
>> >c0t0d2  ONLINE   0 0 0
>> >c0t0d3  ONLINE   0 0 0
>> > [b]logs
>> >  c0t0d4s0  ONLINE   0 0 0[/b]
>> > [b]cache
>> >  c0t0d4s1  ONLINE   0 0 0[/b]
>> >spares
>> >  c0t0d6AVAIL
>> >  c0t0d7AVAIL
>> >
>> >   capacity operationsbandwidth
>> > pool used  avail   read  write   read  write
>> > --  -  -  -  -  -  -
>> > dpool   72.1G  3.55T237 12  29.7M   597K
>> >  raidz172.1G  3.55T237  9  29.7M   469K
>> >c0t0d0  -  -166  3  7.39M   157K
>> >c0t0d1  -  -166  3  7.44M   157K
>> >c0t0d2  -  -166  3  7.39M   157K
>> >c0t0d3  -  -167  3  7.45M   157K
>> >  c0t0d4s020K  4.97G  0  3  0   127K
>> > cache   -  -  -  -  -  -
>> >  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
>> > --  -  -  -  -  -  -
>> > I just don't seem to be getting any bang for the buck I should be.  This
>> was taken while rebuilding an Oracle index, all files stored in this pool.
>>  The WD disks are at 100%, and nothing is coming from the cache.  The cache
>> does have the entire DB cached (17.6G used), but hardly reads anything from
>> it.  I also am not seeing the spike of data flowing into the ZIL either,
>> although iostat show there is just write traffic hitting the SSD:
>> >

Re: [zfs-discuss] SSD and ZFS

2010-02-14 Thread Tracey Bernath
OK, that was the magic  incantation I  was looking for:
- changing the noprefetch option opened the floodgates to the L2ARC
- changing the max queue depth relived the wait time on the drives, although
I may undo this again in the benchmarking since these drives all have NCQ

I went from all four disks of the array at 100%, doing about 170 read
IOPS/25MB/s
to all four disks of the array at 0%, once hitting nealyr 500 IOPS/65MB/s
off the cache drive (@ only 50% load).
This bodes well for adding a second mirrored cache drive to push for the
1KIOPS.

Now I am ready to insert the mirror for the ZIL and the CACHE, and we will
be ready
for some production benchmarking.


 devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
  sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  80  0 82
  sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
  sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
  sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
  sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d1
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d2
0.00.00.00.0  0.0  0.00.00.0   0   0 c0t0d3
  285.20.8 36236.2   14.4  0.0  0.50.01.8   1  37 c0t0d4


And, keep  in mind this was on less than $1000 of hardware.

Thanks,
Tracey


On Sat, Feb 13, 2010 at 9:22 AM, Richard Elling wrote:

> comment below...
>
> On Feb 12, 2010, at 2:25 PM, TMB wrote:
> > I have a similar question, I put together a cheapo RAID with four 1TB WD
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> > # zpool status dpool
> >  pool: dpool
> > state: ONLINE
> > scrub: none requested
> > config:
> >
> >NAMESTATE READ WRITE CKSUM
> >dpool   ONLINE   0 0 0
> >  raidz1ONLINE   0 0 0
> >c0t0d0  ONLINE   0 0 0
> >c0t0d1  ONLINE   0 0 0
> >c0t0d2  ONLINE   0 0 0
> >c0t0d3  ONLINE   0 0 0
> > [b]logs
> >  c0t0d4s0  ONLINE   0 0 0[/b]
> > [b]cache
> >  c0t0d4s1  ONLINE   0 0 0[/b]
> >spares
> >  c0t0d6AVAIL
> >  c0t0d7AVAIL
> >
> >   capacity operationsbandwidth
> > pool used  avail   read  write   read  write
> > --  -  -  -  -  -  -
> > dpool   72.1G  3.55T237 12  29.7M   597K
> >  raidz172.1G  3.55T237  9  29.7M   469K
> >c0t0d0  -  -166  3  7.39M   157K
> >c0t0d1  -  -166  3  7.44M   157K
> >c0t0d2  -  -166  3  7.39M   157K
> >c0t0d3  -  -167  3  7.45M   157K
> >  c0t0d4s020K  4.97G  0  3  0   127K
> > cache   -  -  -  -  -  -
> >  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
> > --  -  -  -  -  -  -
> > I just don't seem to be getting any bang for the buck I should be.  This
> was taken while rebuilding an Oracle index, all files stored in this pool.
>  The WD disks are at 100%, and nothing is coming from the cache.  The cache
> does have the entire DB cached (17.6G used), but hardly reads anything from
> it.  I also am not seeing the spike of data flowing into the ZIL either,
> although iostat show there is just write traffic hitting the SSD:
> >
> > extended device statistics  cpu
> > devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> > sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
> > sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
> > sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
> > sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
> > sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
> > [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
>
> iostat has a "n" option, which is very useful for looking at device names
> :-)
>
> The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
> response time will be agonizingly slow.
>
> By default, for this version of ZFS, up to 35 I/Os will be queued to the
> disk, which is why you see 35.0 in the "actv" column. The combination
> of actv=35 and svc_t>200 indicates that this is the place to start working.
> Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
> This will reduce the concurrent load on the disks, thus reducing svc_t.
>
> http://www.solarisinternals.com/wiki/inde

Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Richard Elling
comment below...

On Feb 12, 2010, at 2:25 PM, TMB wrote:
> I have a similar question, I put together a cheapo RAID with four 1TB WD 
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> # zpool status dpool
>  pool: dpool
> state: ONLINE
> scrub: none requested
> config:
> 
>NAMESTATE READ WRITE CKSUM
>dpool   ONLINE   0 0 0
>  raidz1ONLINE   0 0 0
>c0t0d0  ONLINE   0 0 0
>c0t0d1  ONLINE   0 0 0
>c0t0d2  ONLINE   0 0 0
>c0t0d3  ONLINE   0 0 0
> [b]logs
>  c0t0d4s0  ONLINE   0 0 0[/b]
> [b]cache
>  c0t0d4s1  ONLINE   0 0 0[/b]
>spares
>  c0t0d6AVAIL   
>  c0t0d7AVAIL   
> 
>   capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> dpool   72.1G  3.55T237 12  29.7M   597K
>  raidz172.1G  3.55T237  9  29.7M   469K
>c0t0d0  -  -166  3  7.39M   157K
>c0t0d1  -  -166  3  7.44M   157K
>c0t0d2  -  -166  3  7.39M   157K
>c0t0d3  -  -167  3  7.45M   157K
>  c0t0d4s020K  4.97G  0  3  0   127K
> cache   -  -  -  -  -  -
>  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
> --  -  -  -  -  -  -
> I just don't seem to be getting any bang for the buck I should be.  This was 
> taken while rebuilding an Oracle index, all files stored in this pool.  The 
> WD disks are at 100%, and nothing is coming from the cache.  The cache does 
> have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
> also am not seeing the spike of data flowing into the ZIL either, although 
> iostat show there is just write traffic hitting the SSD:
> 
> extended device statistics  cpu
> devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
> sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
> sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
> sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
> sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
> [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

iostat has a "n" option, which is very useful for looking at device names :-)

The SSD here is perfoming well.  The rest are clobbered. 205 millisecond
response time will be agonizingly slow.

By default, for this version of ZFS, up to 35 I/Os will be queued to the
disk, which is why you see 35.0 in the "actv" column. The combination
of actv=35 and svc_t>200 indicates that this is the place to start working.
Begin by reducing zfs_vdev_max_pending from 35 to something like 1 to 4.
This will reduce the concurrent load on the disks, thus reducing svc_t.
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 -- richard

> Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
> is there a special incantation required to turn on the Turbo mode?
> 
> Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
> the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
> add as a mirror, but it seems pointless if there's no zip to be had
> 
> help?
> 
> Thanks,
> Tracey
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-13 Thread Tracey Bernath
Thanks Brendan,
I was going to move it over to 8kb block size once I got through this index
rebuild. My thinking was that a disproportionate block size would show up as
excessive IO thruput, not a lack of thruput.

The question about the cache comes from the fact that the 18GB or so that it
says is in the cache IS the database. This was why I was thinking the index
rebuild should be CPU constrained, and I should see a spike in reading from
the cache.  If the entire file is cached, why would it go to the disks at
all for the reads?

The disks are delivering about 30MB/s of reads, but this SSD is rated for
sustained 70MB/s, so there should be a chance to pick up 100% gain.

I've seen lots of mention of kernel settings, but those only seem to apply
to cache flushes on sync writes.

Any idea on where to look next? I've spent about a week tinkering with
it.I'm trying to get a major customer to switch over to zfs and an open
storage solution, but I'm afraid if I cant get it to work in the small
scale, I cant convince them about the large scale.

Thanks,
Tracey


On Fri, Feb 12, 2010 at 4:43 PM, Brendan Gregg - Sun Microsystems <
bren...@sun.com> wrote:

> On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
> > I have a similar question, I put together a cheapo RAID with four 1TB WD
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> > # zpool status dpool
> >   pool: dpool
> >  state: ONLINE
> >  scrub: none requested
> > config:
> >
> > NAMESTATE READ WRITE CKSUM
> > dpool   ONLINE   0 0 0
> >   raidz1ONLINE   0 0 0
> > c0t0d0  ONLINE   0 0 0
> > c0t0d1  ONLINE   0 0 0
> > c0t0d2  ONLINE   0 0 0
> > c0t0d3  ONLINE   0 0 0
> > [b]logs
> >   c0t0d4s0  ONLINE   0 0 0[/b]
> > [b]cache
> >   c0t0d4s1  ONLINE   0 0 0[/b]
> > spares
> >   c0t0d6AVAIL
> >   c0t0d7AVAIL
> >
> >capacity operationsbandwidth
> > pool used  avail   read  write   read  write
> > --  -  -  -  -  -  -
> > dpool   72.1G  3.55T237 12  29.7M   597K
> >   raidz172.1G  3.55T237  9  29.7M   469K
> > c0t0d0  -  -166  3  7.39M   157K
> > c0t0d1  -  -166  3  7.44M   157K
> > c0t0d2  -  -166  3  7.39M   157K
> > c0t0d3  -  -167  3  7.45M   157K
> >   c0t0d4s020K  4.97G  0  3  0   127K
> > cache   -  -  -  -  -  -
> >   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
> > --  -  -  -  -  -  -
> > I just don't seem to be getting any bang for the buck I should be.  This
> was taken while rebuilding an Oracle index, all files stored in this pool.
>  The WD disks are at 100%, and nothing is coming from the cache.  The cache
> does have the entire DB cached (17.6G used), but hardly reads anything from
> it.  I also am not seeing the spike of data flowing into the ZIL either,
> although iostat show there is just write traffic hitting the SSD:
> >
> >  extended device statistics  cpu
> > devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> > sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
> > sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100
> > sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100
> > sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0
> > sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100
> > [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
> >
> > Since this SSD is in a RAID array, and just presents as a regular disk
> LUN, is there a special incantation required to turn on the Turbo mode?
> >
> > Doesnt it seem that all this traffic should be maxing out the SSD? Reads
> from the cache, and writes to the ZIL? I have a seocnd identical SSD I
> wanted to add as a mirror, but it seems pointless if there's no zip to be
> had
>
> The most likely reason is that this workload has been identified as
> streaming
> by ZFS, which is prefetching from disk instead of the L2ARC
> (l2arc_nopreftch=1).
>
> It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle
> doing
> 128 Kbyte random I/O?  We usually tune that down before creating the
> database;
> which will use the L2ARC device more efficiently.
>
> Brendan
>
> --
> Brendan Gregg, Fishworks
> http://blogs.sun.com/brendan
>



-- 
Tracey Bernath
913-488-6284
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Brendan Gregg - Sun Microsystems
On Fri, Feb 12, 2010 at 02:25:51PM -0800, TMB wrote:
> I have a similar question, I put together a cheapo RAID with four 1TB WD 
> Black (7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with 
> slice 0 (5GB) for ZIL and the rest of the SSD  for cache:
> # zpool status dpool
>   pool: dpool
>  state: ONLINE
>  scrub: none requested
> config:
> 
> NAMESTATE READ WRITE CKSUM
> dpool   ONLINE   0 0 0
>   raidz1ONLINE   0 0 0
> c0t0d0  ONLINE   0 0 0
> c0t0d1  ONLINE   0 0 0
> c0t0d2  ONLINE   0 0 0
> c0t0d3  ONLINE   0 0 0
> [b]logs
>   c0t0d4s0  ONLINE   0 0 0[/b]
> [b]cache
>   c0t0d4s1  ONLINE   0 0 0[/b]
> spares
>   c0t0d6AVAIL   
>   c0t0d7AVAIL   
> 
>capacity operationsbandwidth
> pool used  avail   read  write   read  write
> --  -  -  -  -  -  -
> dpool   72.1G  3.55T237 12  29.7M   597K
>   raidz172.1G  3.55T237  9  29.7M   469K
> c0t0d0  -  -166  3  7.39M   157K
> c0t0d1  -  -166  3  7.44M   157K
> c0t0d2  -  -166  3  7.39M   157K
> c0t0d3  -  -167  3  7.45M   157K
>   c0t0d4s020K  4.97G  0  3  0   127K
> cache   -  -  -  -  -  -
>   c0t0d4s1  17.6G  36.4G  3  1   249K   119K
> --  -  -  -  -  -  -
> I just don't seem to be getting any bang for the buck I should be.  This was 
> taken while rebuilding an Oracle index, all files stored in this pool.  The 
> WD disks are at 100%, and nothing is coming from the cache.  The cache does 
> have the entire DB cached (17.6G used), but hardly reads anything from it.  I 
> also am not seeing the spike of data flowing into the ZIL either, although 
> iostat show there is just write traffic hitting the SSD:
> 
>  extended device statistics  cpu
> devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
> sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
> sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
> sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
> sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
> sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
> [b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]
> 
> Since this SSD is in a RAID array, and just presents as a regular disk LUN, 
> is there a special incantation required to turn on the Turbo mode?
> 
> Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
> the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to 
> add as a mirror, but it seems pointless if there's no zip to be had

The most likely reason is that this workload has been identified as streaming
by ZFS, which is prefetching from disk instead of the L2ARC (l2arc_nopreftch=1).

It also looks like you've used a 128 Kbyte ZFS record size.  Is Oracle doing
128 Kbyte random I/O?  We usually tune that down before creating the database;
which will use the L2ARC device more efficiently.

Brendan

-- 
Brendan Gregg, Fishworks   http://blogs.sun.com/brendan
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread TMB
I have a similar question, I put together a cheapo RAID with four 1TB WD Black 
(7200) SATAs, in a 3TB RAIDZ1, and I added a 64GB OCZ Vertex SSD, with slice 0 
(5GB) for ZIL and the rest of the SSD  for cache:
# zpool status dpool
  pool: dpool
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
dpool   ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c0t0d0  ONLINE   0 0 0
c0t0d1  ONLINE   0 0 0
c0t0d2  ONLINE   0 0 0
c0t0d3  ONLINE   0 0 0
[b]logs
  c0t0d4s0  ONLINE   0 0 0[/b]
[b]cache
  c0t0d4s1  ONLINE   0 0 0[/b]
spares
  c0t0d6AVAIL   
  c0t0d7AVAIL   

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
dpool   72.1G  3.55T237 12  29.7M   597K
  raidz172.1G  3.55T237  9  29.7M   469K
c0t0d0  -  -166  3  7.39M   157K
c0t0d1  -  -166  3  7.44M   157K
c0t0d2  -  -166  3  7.39M   157K
c0t0d3  -  -167  3  7.45M   157K
  c0t0d4s020K  4.97G  0  3  0   127K
cache   -  -  -  -  -  -
  c0t0d4s1  17.6G  36.4G  3  1   249K   119K
--  -  -  -  -  -  -
I just don't seem to be getting any bang for the buck I should be.  This was 
taken while rebuilding an Oracle index, all files stored in this pool.  The WD 
disks are at 100%, and nothing is coming from the cache.  The cache does have 
the entire DB cached (17.6G used), but hardly reads anything from it.  I also 
am not seeing the spike of data flowing into the ZIL either, although iostat 
show there is just write traffic hitting the SSD:

 extended device statistics  cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b  us sy wt id
sd0 170.00.4 7684.70.0  0.0 35.0  205.3   0 100  11  8  0 82
sd1 168.40.4 7680.20.0  0.0 34.6  205.1   0 100 
sd2 172.00.4 7761.70.0  0.0 35.0  202.9   0 100 
sd3   0.0  0.0  0.00.0  0.0  0.00.0   0   0 
sd4 170.00.4 7727.10.0  0.0 35.0  205.3   0 100 
[b]sd5   1.6  2.6  182.4  104.8  0.0  0.5  117.8   0  31 [/b]

Since this SSD is in a RAID array, and just presents as a regular disk LUN, is 
there a special incantation required to turn on the Turbo mode?

Doesnt it seem that all this traffic should be maxing out the SSD? Reads from 
the cache, and writes to the ZIL? I have a seocnd identical SSD I wanted to add 
as a mirror, but it seems pointless if there's no zip to be had

help?

Thanks,
Tracey
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SSD and ZFS

2010-02-12 Thread Scott Meilicke
I don't think adding an SSD mirror to an existing pool will do much for 
performance. Some of your data will surely go to those SSDs, but I don't think 
the solaris will know they are SSDs and move blocks in and out according to 
usage patterns to give you an all around boost. They will just be used to store 
data, nothing more.

Perhaps it will be more useful to add the SSDs as either an L2ARC or SLOG for 
the ZIL, but that will depend upon your work load. If you do NFS or iSCSI 
access, the putting the ZIL onto the SSD drive(s) will speed up writes. Added 
to the L2ARC will speed up reads.

Here is the ZFS best practices guide, which should help with this decision:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

Read that, then come back with more questions.

Best,
Scott
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss