Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-23 Thread Roch - PAE

Manoj Nayak writes:
  Hi All.
  
  ZFS document says ZFS schedules it's I/O in such way that it manages to 
  saturate a single disk bandwidth  using enough concurrent 128K I/O.
  The no of concurrent I/O is decided by vq_max_pending.The default value 
  for  vq_max_pending is 35.
  
  We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
  record size is set to 128k.When we read/write a 128K record ,it issue a
  128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
  
  We need to saturate all three data disk bandwidth in the Raidz group.Is 
  it required to set vq_max_pending value to 35*3=135  ?
  

Nope.

Once a disk controller is working on 35 requests, we don't
expect to get any more out of it by queueing more requests
and we might even confuse the firmware and get less.

Now for  an array controller and  a vdev  fronting for large
number of disks, then 35 might  be a low number not allowing
full throughput.  Ratherthan tuning 35 up,we suggest
splitting devives into smaller LUNs  since each luns is given
a 35-deep queue. 

Tuning vq_max_pending down helps read and synchronous write
(ZIL) latency. Today the preferred way to help ZIL latency
is to use a Separate Intent Log.

-r


  Thanks
  Manoj Nayak
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-23 Thread Manoj Nayak
Roch - PAE wrote:
 Manoj Nayak writes:
   Hi All.
   
   ZFS document says ZFS schedules it's I/O in such way that it manages to 
   saturate a single disk bandwidth  using enough concurrent 128K I/O.
   The no of concurrent I/O is decided by vq_max_pending.The default value 
   for  vq_max_pending is 35.
   
   We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
   record size is set to 128k.When we read/write a 128K record ,it issue a
   128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
   
   We need to saturate all three data disk bandwidth in the Raidz group.Is 
   it required to set vq_max_pending value to 35*3=135  ?
   

 Nope.

 Once a disk controller is working on 35 requests, we don't
 expect to get any more out of it by queueing more requests
 and we might even confuse the firmware and get less.

 Now for  an array controller and  a vdev  fronting for large
 number of disks, then 35 might  be a low number not allowing
 full throughput.  Ratherthan tuning 35 up,we suggest
 splitting devives into smaller LUNs  since each luns is given
 a 35-deep queue. 

   
It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a 
single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this vdev.
To get higher throughput , I need to do following things ?

1.Reduce no of disks in the raidz group from four to three disk.So that 
same pending queue of 35 is available for lesser no of disk.
0r
2.Create slice out of physical disk  create raidz group out of four 
slices of a physical disk.So that same pending queue of 35 is available 
four slices of one physical disk.

Thanks
Manoj Nayak
 Tuning vq_max_pending down helps read and synchronous write
 (ZIL) latency. Today the preferred way to help ZIL latency
 is to use a Separate Intent Log.

 -r


   Thanks
   Manoj Nayak
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-23 Thread Will Murnane
On Jan 23, 2008 6:36 AM, Manoj Nayak [EMAIL PROTECTED] wrote:
 It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a
 single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this vdev.
 To get higher throughput , I need to do following things ?

 1.Reduce no of disks in the raidz group from four to three disk.So that
 same pending queue of 35 is available for lesser no of disk.
 0r
 2.Create slice out of physical disk  create raidz group out of four
 slices of a physical disk.So that same pending queue of 35 is available
 four slices of one physical disk.
Or switch to mirrors instead, if you can live with the capacity hit.
Mirrors will have much better random read performance than raidz,
since they don't need to read from every disk to make sure the
checksum matches.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-23 Thread Richard Elling
Manoj Nayak wrote:
 Roch - PAE wrote:
   
 Manoj Nayak writes:
   Hi All.
   
   ZFS document says ZFS schedules it's I/O in such way that it manages to 
   saturate a single disk bandwidth  using enough concurrent 128K I/O.
   The no of concurrent I/O is decided by vq_max_pending.The default value 
   for  vq_max_pending is 35.
   
   We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
   record size is set to 128k.When we read/write a 128K record ,it issue a
   128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
   
   We need to saturate all three data disk bandwidth in the Raidz group.Is 
   it required to set vq_max_pending value to 35*3=135  ?
   

 Nope.

 Once a disk controller is working on 35 requests, we don't
 expect to get any more out of it by queueing more requests
 and we might even confuse the firmware and get less.

 Now for  an array controller and  a vdev  fronting for large
 number of disks, then 35 might  be a low number not allowing
 full throughput.  Ratherthan tuning 35 up,we suggest
 splitting devives into smaller LUNs  since each luns is given
 a 35-deep queue. 

   
 
 It means 4-disk raid-z group inside ZFS pool is exported to ZFS as a 
 single device ( vdev ) .ZFS assigns vq_max_pending value of 35 to this vdev.
 To get higher throughput , I need to do following things ?
   

This is not the terminology we use to describe ZFS.  Quite simply,
a storage pool contains devices configured in some way, hopefully
using  some form of data protection (mirror, raidz[12]) -- see zpool(1m).
Each storage pool can contain one or more file systems or volumes --
see zfs(1m).

The term export is used to describe transition of ownership of a
storage pool between different hosts.

 1.Reduce no of disks in the raidz group from four to three disk.So that 
 same pending queue of 35 is available for lesser no of disk.
 0r
   

35 is for each physical disk.

 2.Create slice out of physical disk  create raidz group out of four 
 slices of a physical disk.So that same pending queue of 35 is available 
 four slices of one physical disk.
   

This will likely have a negative scaling affect.  Some devices, especially
raw disks, have wimpy microprocessors and limited memory.  You can
easily overload them and see the response time increase dramatically,
just as queuing theory will suggest.  Some research has shown that a
value of 8-16 is better, at least for some storage devices.   A value of 1
is perhaps too low, at least for devices which can handle multiple
outstanding I/Os.

  My workload issues around 5000 MB read I/0  iopattern says around 
55% of the IO are random in nature.
  I don't know how much prefetching through track cache is going to 
help here.Probably I can try disabling  vdev_cache
  through set 'zfs_vdev_cache_max' 1

We can't size something like this unless we also know the I/O
size.  If you are talking small iops, say 8 kBytes, then you'll
need lots of disks.  For larger iops, you may be able to get
by with fewer disks.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Manoj Nayak
Hi All.

ZFS document says ZFS schedules it's I/O in such way that it manages to 
saturate a single disk bandwidth  using enough concurrent 128K I/O.
The no of concurrent I/O is decided by vq_max_pending.The default value 
for  vq_max_pending is 35.

We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
record size is set to 128k.When we read/write a 128K record ,it issue a
128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.

We need to saturate all three data disk bandwidth in the Raidz group.Is 
it required to set vq_max_pending value to 35*3=135  ?

Thanks
Manoj Nayak
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages to 
 saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default value 
 for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
   

Yes, this is how it works for a read without errors.  For a write, you
should see 4 writes, each 128KBytes/3.  Writes may also be
coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz group.Is 
 it required to set vq_max_pending value to 35*3=135  ?
   

No.  vq_max_pending applies to each vdev.  Use iostat to see what
the device load is.  For the commonly used Hitachi 500 GByte disks
in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
the disk bandwidth will be the limiting factor for the hardware.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread manoj nayak

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages to 
 saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default value 
 for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS record 
 size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz group.Is 
 it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

4 disk raidz group issues 128k/3=42.6k io to each individual data disk.If 35 
concurrent 128k IO is enough to saturate a disk( vdev ) ,
then 35*3=105 concurrent 42k io will be required to saturates the same disk.

Thanks
Manoj Nayak

Use iostat to see what
 the device load is.  For the commonly used Hitachi 500 GByte disks
 in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
 will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
 the disk bandwidth will be the limiting factor for the hardware.
 -- richard

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
manoj nayak wrote:

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages 
 to saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default 
 value for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz 
 group.Is it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the same 
 disk.

ZFS doesn't know anything about disk saturation.  It will send
up to vq_max_pending  I/O requests per vdev (usually a vdev is a
disk). It will try to keep vq_max_pending I/O requests queued to
the vdev.

For writes, you should see them become coalesced, so rather than
sending 3 42.6kByte write requests to a vdev, you might see one
128kByte write request.

In other words, ZFS has an I/O scheduler which is responsible
for sending I/O requests to vdevs.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
manoj nayak wrote:

 - Original Message - From: Richard Elling 
 [EMAIL PROTECTED]
 To: manoj nayak [EMAIL PROTECTED]
 Cc: zfs-discuss@opensolaris.org
 Sent: Wednesday, January 23, 2008 7:20 AM
 Subject: Re: [zfs-discuss] ZFS vq_max_pending value ?


 manoj nayak wrote:

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it 
 manages to saturate a single disk bandwidth  using enough 
 concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default 
 value for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it 
 issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz 
 group.Is it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the 
 same disk.

 ZFS doesn't know anything about disk saturation.  It will send
 up to vq_max_pending  I/O requests per vdev (usually a vdev is a
 disk). It will try to keep vq_max_pending I/O requests queued to
 the vdev.

 I can see the avg pending I/Os hitting my  vq_max_pending limit, 
 then raising the limit would be a good thing. I think , it's due to
 many 42k Read IO to individual disk in the 4 disk raidz group.

You're dealing with a queue here.  iostat's average pending I/Os represents
the queue depth.   Some devices can't handle a large queue.  In any
case, queuing theory applies.

Note that for reads, the disk will likely have a track cache, so it is
not a good assumption that a read I/O will require a media access.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Manoj Nayak

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the 
 same disk.

 ZFS doesn't know anything about disk saturation.  It will send
 up to vq_max_pending  I/O requests per vdev (usually a vdev is a
 disk). It will try to keep vq_max_pending I/O requests queued to
 the vdev.

 I can see the avg pending I/Os hitting my  vq_max_pending limit, 
 then raising the limit would be a good thing. I think , it's due to
 many 42k Read IO to individual disk in the 4 disk raidz group.

 You're dealing with a queue here.  iostat's average pending I/Os 
 represents
 the queue depth.   Some devices can't handle a large queue.  In any
 case, queuing theory applies.

 Note that for reads, the disk will likely have a track cache, so it is
 not a good assumption that a read I/O will require a media access.
My workload issues around 5000 MB read I/0  iopattern says around 55% 
of the IO are random in nature.
I don't know how much prefetching through track cache is going to help 
here.Probably I can try disabling vdev_cache
through set 'zfs_vdev_cache_max' 1

Thanks
Manoj Nayak
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss