Stuck Request

2012-10-29 Thread Ian Pye
Guys, I'm running a three node cluster (version 0.53), and after a while of running under constant write load generated by two daemons, I am seeing that 1 request is totally blocked: [WRN] 1 slow requests, 1 included below; oldest blocked for 7550.891933 secs 2012-10-29 10:33:54.689563 osd.0

Re: Stuck Request

2012-10-29 Thread Samuel Just
Interesting, I don't think the request is stalled. I think we completed the request, but leaked a reference to the request structure. Do you see IO from the clients stall? What is the output of ceph -s? What version are you running (ceph-osd --version)? -Sam On Mon, Oct 29, 2012 at 10:53 AM,

Re: Stuck Request

2012-10-29 Thread Ian Pye
The client's IO held up fine, and I don't see any signs of them blocking. The writes are done inside of an aio_operate() rados call. In the client logs too, I don't see any record of a failed write. ceph -s health HEALTH_OK monmap e1: 1 mons at {a=10.25.36.11:6789/0}, election epoch 2,