subject:"\[ceph\-users\] One OSD misbehaving \(spinning 100% CPU, delayed ops\)"

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-12-14 Thread Matthew Vernon

On 29/11/17 17:24, Matthew Vernon wrote: > We have a 3,060 OSD ceph cluster (running Jewel > 10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by > which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that > host), and having ops blocking on it for some time. It w

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Brad Hubbard

# ps axHo %cpu,stat,pid,tid,pgid,ppid,comm,wchan | grep ceph-osd To find the actual thread that is using 100% CPU. # for x in `seq 1 5`; do gdb -batch -p [PID] -ex "thr appl all bt"; echo; done > /tmp/osd.stack.dump Then look at the stacks for the thread that was using all the CPU and see what i

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Denes Dolhay

Hello, You might consider checking the iowait (during the problem), and the dmesg (after it recovered). Maybe an issue with the given sata/sas/nvme port? Regards, Denes On 11/29/2017 06:24 PM, Matthew Vernon wrote: Hi, We have a 3,060 OSD ceph cluster (running Jewel 10.2.7-0ubuntu0.16.0

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Jean-Charles Lopez

Hi Mathhew, anything special happening on the NIC side that could cause a problem? Packet drops? Incorrect jumbo frame settings causing fragmentation? Have you checked the cstate settings on the box? Have you disabled energy saving settings differently from the other boxes? Any unexpected wait

[ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

2017-11-29 Thread Matthew Vernon

Hi, We have a 3,060 OSD ceph cluster (running Jewel 10.2.7-0ubuntu0.16.04.1), and one OSD on one host keeps misbehaving - by which I mean it keeps spinning ~100% CPU (cf ~5% for other OSDs on that host), and having ops blocking on it for some time. It will then behave for a bit, and then go back t

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

Re: [ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

[ceph-users] One OSD misbehaving (spinning 100% CPU, delayed ops)

5 matches

Site Navigation

Mail list logo

Footer information