Hi Sage,
On 02/26/2013 12:36 PM, Sage Weil wrote:
On Tue, 26 Feb 2013, Jim Schutt wrote:
I think the right solution is to make an option that will setsockopt on
SO_RECVBUF to some value (say, 256KB). I pushed a branch that does this,
wip-tcp. Do you mind checking to see if this addresses
On Thu, 28 Feb 2013, Jim Schutt wrote:
Hi Sage,
On 02/26/2013 12:36 PM, Sage Weil wrote:
On Tue, 26 Feb 2013, Jim Schutt wrote:
I think the right solution is to make an option that will setsockopt on
SO_RECVBUF to some value (say, 256KB). I pushed a branch that does this,
wip-tcp.
Hi Sage,
On 02/20/2013 05:12 PM, Sage Weil wrote:
Hi Jim,
I'm resurrecting an ancient thread here, but: we've just observed this on
another big cluster and remembered that this hasn't actually been fixed.
Sorry for the delayed reply - I missed this in a backlog
of unread email...
I
On Tue, 26 Feb 2013, Jim Schutt wrote:
I think the right solution is to make an option that will setsockopt on
SO_RECVBUF to some value (say, 256KB). I pushed a branch that does this,
wip-tcp. Do you mind checking to see if this addresses the issue (without
manually adjusting things
Hi Jim,
I'm resurrecting an ancient thread here, but: we've just observed this on
another big cluster and remembered that this hasn't actually been fixed.
I think the right solution is to make an option that will setsockopt on
SO_RECVBUF to some value (say, 256KB). I pushed a branch that does
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
I'm currently running 24 OSDs/server, one 1TB 7200 RPM SAS drive
per OSD. During a test I watch both OSD servers with both
vmstat and iostat.
During a good period, vmstat says
On Fri, Feb 24, 2012 at 07:38, Jim Schutt jasc...@sandia.gov wrote:
I've finally figured out what is going on with this behaviour.
Memory usage was on the right track.
It turns out to be an unfortunate interaction between the
number of OSDs/server, number of clients, TCP socket buffer
I created ticket http://tracker.newdream.net/issues/2100 for this.
On Fri, Feb 24, 2012 at 10:31, Tommi Virtanen
tommi.virta...@dreamhost.com wrote:
On Fri, Feb 24, 2012 at 07:38, Jim Schutt jasc...@sandia.gov wrote:
I've finally figured out what is going on with this behaviour.
Memory usage
On 02/10/2012 05:05 PM, sridhar basam wrote:
But the server never ACKed that packet. Too busy?
I was collecting vmstat data during the run; here's the important bits:
Fri Feb 10 11:56:51 MST 2012
vmstat -w 8 16
procs ---memory-- ---swap-- -io
On 02/09/2012 06:26 PM, sridhar basam wrote:
Do you mind capturing to a pcap file and providing that. Makes it
easier to analyse things. If not, i understand. If you can make do the
capture on both ends, do it with a snaplen of 68 so that you get all
of the headers and there shouldn't be too
On Fri, Feb 10, 2012 at 10:32 AM, Jim Schutt jasc...@sandia.gov wrote:
On 02/09/2012 06:26 PM, sridhar basam wrote:
Do you mind capturing to a pcap file and providing that. Makes it
easier to analyse things. If not, i understand. If you can make do the
capture on both ends, do it with a
[ added Cc:netdev
See http://www.spinics.net/lists/ceph-devel/msg04804.html
for the start of the thread.
-- Jim
]
On 02/10/2012 10:13 AM, sridhar basam wrote:
On Fri, Feb 10, 2012 at 10:32 AM, Jim Schuttjasc...@sandia.gov wrote:
On 02/09/2012 06:26 PM, sridhar basam wrote:
Do you
On Fri, Feb 10, 2012 at 6:09 PM, Jim Schutt jasc...@sandia.gov wrote:
[ added Cc:netdev
See http://www.spinics.net/lists/ceph-devel/msg04804.html
for the start of the thread.
-- Jim
]
On 02/10/2012 10:13 AM, sridhar basam wrote:
On Fri, Feb 10, 2012 at 10:32 AM, Jim
On 02/06/2012 11:35 AM, Gregory Farnum wrote:
On Mon, Feb 6, 2012 at 10:20 AM, Jim Schuttjasc...@sandia.gov wrote:
On 02/06/2012 10:22 AM, Yehuda Sadeh Weinraub wrote:
On Mon, Feb 6, 2012 at 8:20 AM, Jim Schuttjasc...@sandia.govwrote:
The above suggests to me that the slowdown is a
On 02/09/2012 03:40 PM, sridhar basam wrote:
On Thu, Feb 9, 2012 at 3:53 PM, Jim Schuttjasc...@sandia.gov wrote:
On 02/06/2012 11:35 AM, Gregory Farnum wrote:
On Mon, Feb 6, 2012 at 10:20 AM, Jim Schuttjasc...@sandia.govwrote:
On 02/06/2012 10:22 AM, Yehuda Sadeh Weinraub wrote:
On
On Thu, Feb 9, 2012 at 15:15, Jim Schutt jasc...@sandia.gov wrote:
I suspect a bug in the stack, as at an application level I get
the same sort of stalls whether I use IP over ethernet or IPoIB.
I need to get traces for both cases to prove that it is the same
stall...
Hi. I just wanted to
On Thu, Feb 9, 2012 at 6:15 PM, Jim Schutt jasc...@sandia.gov wrote:
On 02/09/2012 03:40 PM, sridhar basam wrote:
On Thu, Feb 9, 2012 at 3:53 PM, Jim Schuttjasc...@sandia.gov wrote:
On 02/06/2012 11:35 AM, Gregory Farnum wrote:
On Mon, Feb 6, 2012 at 10:20 AM, Jim Schuttjasc...@sandia.gov
On 02/03/2012 05:03 PM, Yehuda Sadeh Weinraub wrote:
On Fri, Feb 3, 2012 at 3:33 PM, Jim Schuttjasc...@sandia.gov wrote:
You can try running 'iostat -t -kx -d 1' on the osds, and see whether %util
reaches 100%, and when it happens whether it's due to number of io
operations that are
On Mon, Feb 6, 2012 at 8:20 AM, Jim Schutt jasc...@sandia.gov wrote:
On 02/03/2012 05:03 PM, Yehuda Sadeh Weinraub wrote:
On Fri, Feb 3, 2012 at 3:33 PM, Jim Schuttjasc...@sandia.gov wrote:
You can try running 'iostat -t -kx -d 1' on the osds, and see whether
%util
reaches 100%, and when
On 02/06/2012 10:22 AM, Yehuda Sadeh Weinraub wrote:
On Mon, Feb 6, 2012 at 8:20 AM, Jim Schuttjasc...@sandia.gov wrote:
The above suggests to me that the slowdown is a result
of requests not getting submitted at the same rate as
when things are running well.
Yeah, it really looks like
On Mon, Feb 6, 2012 at 10:20 AM, Jim Schutt jasc...@sandia.gov wrote:
On 02/06/2012 10:22 AM, Yehuda Sadeh Weinraub wrote:
On Mon, Feb 6, 2012 at 8:20 AM, Jim Schuttjasc...@sandia.gov wrote:
The above suggests to me that the slowdown is a result
of requests not getting submitted at the
On 02/02/2012 05:28 PM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 12:22 PM, Jim Schuttjasc...@sandia.gov wrote:
I found 0 instances of waiting for commit in all my OSD logs for my last
run.
So I never waited on the journal?
Looks like it. Interesting.
So far I'm looking at two
On Feb 3, 2012, at 8:18 AM, Jim Schutt jasc...@sandia.gov wrote:
On 02/02/2012 05:28 PM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 12:22 PM, Jim Schuttjasc...@sandia.gov wrote:
I found 0 instances of waiting for commit in all my OSD logs for my last
run.
So I never waited on the
On Fri, 3 Feb 2012, Jim Schutt wrote:
On 02/02/2012 05:28 PM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 12:22 PM, Jim Schuttjasc...@sandia.gov wrote:
I found 0 instances of waiting for commit in all my OSD logs for my last
run.
So I never waited on the journal?
Looks like
(resent because I forgot the list on my original reply)
On 02/01/2012 03:33 PM, Gregory Farnum wrote:
On Wed, Feb 1, 2012 at 7:54 AM, Jim Schuttjasc...@sandia.gov wrote:
Hi,
FWIW, I've been trying to understand op delays under very heavy write
load, and have been working a little with the
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schutt jasc...@sandia.gov wrote:
I'm currently running 24 OSDs/server, one 1TB 7200 RPM SAS drive
per OSD. During a test I watch both OSD servers with both
vmstat and iostat.
During a good period, vmstat says the server is sustaining 2 GB/s
for multiple
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
I'm currently running 24 OSDs/server, one 1TB 7200 RPM SAS drive
per OSD. During a test I watch both OSD servers with both
vmstat and iostat.
During a good period, vmstat says
On Thu, 2 Feb 2012, Jim Schutt wrote:
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
I'm currently running 24 OSDs/server, one 1TB 7200 RPM SAS drive
per OSD. During a test I watch both OSD servers with both
vmstat
On Thu, Feb 2, 2012 at 11:06 AM, Jim Schutt jasc...@sandia.gov wrote:
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
The typical pattern I see is that a run starts with tens of seconds
of aggregate throughput 2 GB/s. Then
On 02/02/2012 12:15 PM, Sage Weil wrote:
On Thu, 2 Feb 2012, Jim Schutt wrote:
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
I'm currently running 24 OSDs/server, one 1TB 7200 RPM SAS drive
per OSD. During a test I watch
On 02/02/2012 12:32 PM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 11:06 AM, Jim Schuttjasc...@sandia.gov wrote:
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.govwrote:
The typical pattern I see is that a run starts with tens of
On 02/02/2012 01:22 PM, Jim Schutt wrote:
On 02/02/2012 12:32 PM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 11:06 AM, Jim Schuttjasc...@sandia.gov wrote:
On 02/02/2012 10:52 AM, Gregory Farnum wrote:
On Thu, Feb 2, 2012 at 7:29 AM, Jim Schuttjasc...@sandia.gov wrote:
The typical pattern
On Thu, Feb 2, 2012 at 12:22 PM, Jim Schutt jasc...@sandia.gov wrote:
I found 0 instances of waiting for commit in all my OSD logs for my last
run.
So I never waited on the journal?
Looks like it. Interesting.
So far I'm looking at two behaviours I've noticed that seem anomalous to
me.
Hi,
FWIW, I've been trying to understand op delays under very heavy write
load, and have been working a little with the policy throttler in hopes of
using throttling delays to help track down which ops were backing up.
Without much success, unfortunately.
When I saw the wip-osd-op-tracking
On Wed, Feb 1, 2012 at 7:54 AM, Jim Schutt jasc...@sandia.gov wrote:
Hi,
FWIW, I've been trying to understand op delays under very heavy write
load, and have been working a little with the policy throttler in hopes of
using throttling delays to help track down which ops were backing up.
35 matches
Mail list logo