Re: [zfs-discuss] Odd prioritisation issues.

2007-12-12 Thread Dickon Hood
On Wed, Dec 12, 2007 at 10:27:56 +0100, Roch - PAE wrote:

: O_DSYNC was  good idea. Then if  you  have recent Nevada you
: can  use   the separate  intent log  (log   keyword in zpool
: create)   to  absord thosewrites without   having splindle
: competition with the reads. Your write workload should then
: be well handled here (unless the incoming network processing 
: is itself delayed).

Thanks for the suggestion -- I'll see if we can give that a go.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-12 Thread Roch - PAE
Dickon Hood writes:
 > On Fri, Dec 07, 2007 at 13:14:56 +, I wrote:
 > : On Fri, Dec 07, 2007 at 12:58:17 +, Darren J Moffat wrote:
 > : : Dickon Hood wrote:
 > : : >On Fri, Dec 07, 2007 at 12:38:11 +, Darren J Moffat wrote:
 > : : >: Dickon Hood wrote:
 > 
 > : : >: >We're seeing the writes stall in favour of the reads.  For normal
 > : : >: >workloads I can understand the reasons, but I was under the 
 > impression
 > : : >: >that real-time processes essentially trump all others, and I'm 
 > surprised
 > : : >: >by this behaviour; I had a dozen or so RT-processes sat waiting for 
 > disc
 > : : >: >for about 20s.
 > 
 > : : >: Are the files opened with O_DSYNC or does the application call fsync ?
 > 
 > : : >No.  O_WRONLY|O_CREAT|O_LARGEFILE|O_APPEND.  Would that help?
 > 
 > : : Don't know if it will help, but it will be different :-).  I suspected 
 > : : that since you put the processes in the RT class you would also be doing 
 > : : synchronous writes.
 > 
 > : Right.  I'll let you know on Monday; I'll need to restart it in the
 > : morning.
 > 
 > I was a tad busy yesterday and didn't have the time, but I've switched one
 > of our recorder processes (the one doing the HD stream; ~17Mb/s,
 > broadcasting a preview we don't mind trashing) to a version of the code
 > which opens its file O_DSYNC as suggested.
 > 
 > We've gone from ~130 write ops per second and 10MB/s to ~450 write ops per
 > second and 27MB/s, with a marginally higher CPU usage.  This is roughly
 > what I'd expect.
 > 
 > We've artifically throttled the reads, which has helped (but not fixed; it
 > isn't as determinative as we'd like) the starvation problem at the expense
 > of increasing a latency we'd rather have as close to zero as possible.
 > 
 > Any ideas?
 > 

O_DSYNC was  good idea. Then if  you  have recent Nevada you
can  use   the separate  intent log  (log   keyword in zpool
create)   to  absord thosewrites without   having splindle
competition with the reads. Your write workload should then
be well handled here (unless the incoming network processing 
is itself delayed).


-r


 > Thanks.
 > 
 > -- 
 > Dickon Hood
 > 
 > Due to digital rights management, my .sig is temporarily unavailable.
 > Normal service will be resumed as soon as possible.  We apologise for the
 > inconvenience in the meantime.
 > 
 > No virus was found in this outgoing message as I didn't bother looking.
 > 
 > ___
 > zfs-discuss mailing list
 > zfs-discuss@opensolaris.org
 > http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-11 Thread Dickon Hood
On Fri, Dec 07, 2007 at 13:14:56 +, I wrote:
: On Fri, Dec 07, 2007 at 12:58:17 +, Darren J Moffat wrote:
: : Dickon Hood wrote:
: : >On Fri, Dec 07, 2007 at 12:38:11 +, Darren J Moffat wrote:
: : >: Dickon Hood wrote:

: : >: >We're seeing the writes stall in favour of the reads.  For normal
: : >: >workloads I can understand the reasons, but I was under the impression
: : >: >that real-time processes essentially trump all others, and I'm surprised
: : >: >by this behaviour; I had a dozen or so RT-processes sat waiting for disc
: : >: >for about 20s.

: : >: Are the files opened with O_DSYNC or does the application call fsync ?

: : >No.  O_WRONLY|O_CREAT|O_LARGEFILE|O_APPEND.  Would that help?

: : Don't know if it will help, but it will be different :-).  I suspected 
: : that since you put the processes in the RT class you would also be doing 
: : synchronous writes.

: Right.  I'll let you know on Monday; I'll need to restart it in the
: morning.

I was a tad busy yesterday and didn't have the time, but I've switched one
of our recorder processes (the one doing the HD stream; ~17Mb/s,
broadcasting a preview we don't mind trashing) to a version of the code
which opens its file O_DSYNC as suggested.

We've gone from ~130 write ops per second and 10MB/s to ~450 write ops per
second and 27MB/s, with a marginally higher CPU usage.  This is roughly
what I'd expect.

We've artifically throttled the reads, which has helped (but not fixed; it
isn't as determinative as we'd like) the starvation problem at the expense
of increasing a latency we'd rather have as close to zero as possible.

Any ideas?

Thanks.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-08 Thread Dickon Hood
On Fri, Dec 07, 2007 at 05:27:25 -0800, Anton B. Rang wrote:
: > I was under the impression that real-time processes essentially trump all
: > others, and I'm surprised by this behaviour; I had a dozen or so 
RT-processes
: > sat waiting for disc for about 20s.

: Process priorities on Solaris affect CPU scheduling, but not (currently)
: I/O scheduling nor memory usage.

Ah, hmm.  I hadn't appreciated that.  I'm surprised.

: > *  Is this a ZFS issue?  Would we be better using another filesystem?

: It is a ZFS issue, though depending on your I/O patterns, you might be
: able to see similar starvation on other file systems.  In general, other
: file systems issue I/O independently, so on average each process will
: make roughly equal forward process on a continuous basis.  You still
: don't have guaranteed I/O rates (in the sense that XFS on SGI, for
: instance, provides).

That would make sense.  I've not seen this before on any other filesystem.

: > *  Is there any way to mitigate against it?  Reduce the number of iops
: > available for reading, say?
: > Is there any way to disable or invert this behaviour?

: I'll let the ZFS developers tackle this one 

: ---

: Have you considered using two systems (or two virtual systems) to ensure
: that the writer isn't affected by reads? Some QFS customers use this
: configuration, with one system writing to disk and another system
: reading from the same disk. This requires the use of a SAN file system
: but it provides the potential for much greater (and controllable)
: throughput. If your I/O needs are modest (less than a few GB/second),
: this is overkill.

We're writing (currently) about 10MB/s; this may rise to about double that
if we add the other multiplexes.  We're taking the BBC's DVB content
off-air, splitting it into programme chunks, and moving it from the
machine that's doing the recording to a filestore.  As it's off-air
streams, we have no control over the inbound data -- it just arrives
whether we like it or not.  We do control the movement from the recorder
to the filestore, but as this is largely achieved via a Perl module
calling sendfile(), even that's mostly out of our hands.

Definitely a headscratcher.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Anton B. Rang
> I was under the impression that real-time processes essentially trump all
> others, and I'm surprised by this behaviour; I had a dozen or so RT-processes
> sat waiting for disc for about 20s.

Process priorities on Solaris affect CPU scheduling, but not (currently) I/O 
scheduling nor memory usage.

> *  Is this a ZFS issue?  Would we be better using another filesystem?

It is a ZFS issue, though depending on your I/O patterns, you might be able to 
see similar starvation on other file systems.  In general, other file systems 
issue I/O independently, so on average each process will make roughly equal 
forward process on a continuous basis.  You still don't have guaranteed I/O 
rates (in the sense that XFS on SGI, for instance, provides).

> *  Is there any way to mitigate against it?  Reduce the number of iops
> available for reading, say?
> Is there any way to disable or invert this behaviour?

I'll let the ZFS developers tackle this one 

---

Have you considered using two systems (or two virtual systems) to ensure that 
the writer isn't affected by reads? Some QFS customers use this configuration, 
with one system writing to disk and another system reading from the same disk. 
This requires the use of a SAN file system but it provides the potential for 
much greater (and controllable) throughput. If your I/O needs are modest (less 
than a few GB/second), this is overkill.

Anton
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Dickon Hood
On Fri, Dec 07, 2007 at 12:58:17 +, Darren J Moffat wrote:
: Dickon Hood wrote:
: >On Fri, Dec 07, 2007 at 12:38:11 +, Darren J Moffat wrote:
: >: Dickon Hood wrote:
: >: >We've got an interesting application which involves recieving lots of
: >: >multicast groups, and writing the data to disc as a cache.  We're
: >: >currently using ZFS for this cache, as we're potentially dealing with a
: >: >couple of TB at a time.

: >: >The threads writing to the filesystem have real-time SCHED_FIFO 
: >priorities
: >: >set to 25.  The processes recovering data from the cache and moving it
: >: >elsewhere are niced at +10.

: >: >We're seeing the writes stall in favour of the reads.  For normal
: >: >workloads I can understand the reasons, but I was under the impression
: >: >that real-time processes essentially trump all others, and I'm surprised
: >: >by this behaviour; I had a dozen or so RT-processes sat waiting for disc
: >: >for about 20s.

: >: Are the files opened with O_DSYNC or does the application call fsync ?

: >No.  O_WRONLY|O_CREAT|O_LARGEFILE|O_APPEND.  Would that help?

: Don't know if it will help, but it will be different :-).  I suspected 
: that since you put the processes in the RT class you would also be doing 
: synchronous writes.

Right.  I'll let you know on Monday; I'll need to restart it in the
morning.

I put the processes in the RT class as without they dropped packets once
in a while, especially on lesser hardware (a Netra T1 can't cope without,
a Niagara usually can...).  Very odd.

: If you can test this it may be worth doing so for the sake of gathering 
: another data point.

Noted.  I suspect (from reading the man pages) it won't make much
difference, as to my mind it looks like a scheduling issue.  Just for
interest's sake: when everything is behaving normally when writing only,
'zpool iostat 10' looks like:

   capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
content 56.9G  2.66T  0118  0  9.64M

normally, whilst reading and writing it looks like:

content 69.8G  2.65T435103  54.3M  9.63M

and when everything breaks, it looks like:

content  119G  2.60T564  0  66.3M  0

prstat usually shows processes idling, a priority 125 for a moment, and
other behaviour that I'd expect.  When it all breaks, I get most of them
sat at priority 125 thumbtwiddling.

Perplexing.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Darren J Moffat
Dickon Hood wrote:
> On Fri, Dec 07, 2007 at 12:38:11 +, Darren J Moffat wrote:
> : Dickon Hood wrote:
> : >We've got an interesting application which involves recieving lots of
> : >multicast groups, and writing the data to disc as a cache.  We're
> : >currently using ZFS for this cache, as we're potentially dealing with a
> : >couple of TB at a time.
> 
> : >The threads writing to the filesystem have real-time SCHED_FIFO priorities
> : >set to 25.  The processes recovering data from the cache and moving it
> : >elsewhere are niced at +10.
> 
> : >We're seeing the writes stall in favour of the reads.  For normal
> : >workloads I can understand the reasons, but I was under the impression
> : >that real-time processes essentially trump all others, and I'm surprised
> : >by this behaviour; I had a dozen or so RT-processes sat waiting for disc
> : >for about 20s.
> 
> : Are the files opened with O_DSYNC or does the application call fsync ?
> 
> No.  O_WRONLY|O_CREAT|O_LARGEFILE|O_APPEND.  Would that help?

Don't know if it will help, but it will be different :-).  I suspected 
that since you put the processes in the RT class you would also be doing 
synchronous writes.

If you can test this it may be worth doing so for the sake of gathering 
another data point.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Dickon Hood
On Fri, Dec 07, 2007 at 12:38:11 +, Darren J Moffat wrote:
: Dickon Hood wrote:
: >We've got an interesting application which involves recieving lots of
: >multicast groups, and writing the data to disc as a cache.  We're
: >currently using ZFS for this cache, as we're potentially dealing with a
: >couple of TB at a time.

: >The threads writing to the filesystem have real-time SCHED_FIFO priorities
: >set to 25.  The processes recovering data from the cache and moving it
: >elsewhere are niced at +10.

: >We're seeing the writes stall in favour of the reads.  For normal
: >workloads I can understand the reasons, but I was under the impression
: >that real-time processes essentially trump all others, and I'm surprised
: >by this behaviour; I had a dozen or so RT-processes sat waiting for disc
: >for about 20s.

: Are the files opened with O_DSYNC or does the application call fsync ?

No.  O_WRONLY|O_CREAT|O_LARGEFILE|O_APPEND.  Would that help?

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Darren J Moffat
Dickon Hood wrote:
> We've got an interesting application which involves recieving lots of
> multicast groups, and writing the data to disc as a cache.  We're
> currently using ZFS for this cache, as we're potentially dealing with a
> couple of TB at a time.
> 
> The threads writing to the filesystem have real-time SCHED_FIFO priorities
> set to 25.  The processes recovering data from the cache and moving it
> elsewhere are niced at +10.
> 
> We're seeing the writes stall in favour of the reads.  For normal
> workloads I can understand the reasons, but I was under the impression
> that real-time processes essentially trump all others, and I'm surprised
> by this behaviour; I had a dozen or so RT-processes sat waiting for disc
> for about 20s.

Are the files opened with O_DSYNC or does the application call fsync ?

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Odd prioritisation issues.

2007-12-07 Thread Dickon Hood
We've got an interesting application which involves recieving lots of
multicast groups, and writing the data to disc as a cache.  We're
currently using ZFS for this cache, as we're potentially dealing with a
couple of TB at a time.

The threads writing to the filesystem have real-time SCHED_FIFO priorities
set to 25.  The processes recovering data from the cache and moving it
elsewhere are niced at +10.

We're seeing the writes stall in favour of the reads.  For normal
workloads I can understand the reasons, but I was under the impression
that real-time processes essentially trump all others, and I'm surprised
by this behaviour; I had a dozen or so RT-processes sat waiting for disc
for about 20s.

My questions:

  *  Is this a ZFS issue?  Would we be better using another filesystem?

  *  Is there any way to mitigate against it?  Reduce the number of iops
 available for reading, say?

  *  Is there any way to disable or invert this behaviour?

  *  Is this a bug, or should it be considered one?

Thanks.

-- 
Dickon Hood

Due to digital rights management, my .sig is temporarily unavailable.
Normal service will be resumed as soon as possible.  We apologise for the
inconvenience in the meantime.

No virus was found in this outgoing message as I didn't bother looking.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss