Looks like the bug with the kernel using ceph and XFS was fixed, I haven't
tested it yet but just wanted to give an update.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1527062
On Tue, Dec 8, 2015 at 8:05 AM Scottix wrote:
> I can confirm it seems to be kernels greater than 3.16, we had
I can confirm it seems to be kernels greater than 3.16, we had this problem
where servers would lock up and had to perform restarts on a weekly basis.
We downgraded to 3.16, since then we have not had to do any restarts.
I did find this thread in the XFS forums and I am not sure if has been
fixed
We run deep scrubs via cron with a script so we know when deep scrubs are
happening, and we've seen nodes fail both during deep scrubbing and while
no deep scrubs are occurring so I'm pretty sure its not related.
On Tue, Dec 8, 2015 at 2:42 AM, Benedikt Fraunhofer
wrote:
> Hi Tom,
>
> 2015-12-0
The same thing happens to my setup with CentOS7.x + non-stock kernel
(kernel-ml from elrepo).
I was not happy with IOPS I got out of the stock CentOS7.x so I did the
kernel upgrade and crashes started to happen until some of the OSDs
become non-bootable at all. The funny thing is that I was no
Hi Tom,
2015-12-08 10:34 GMT+01:00 Tom Christensen :
> We didn't go forward to 4.2 as its a large production cluster, and we just
> needed the problem fixed. We'll probably test out 4.2 in the next couple
unfortunately we don't have the luxury of a test cluster.
and to add to that, we couldnt s
We didn't go forward to 4.2 as its a large production cluster, and we just
needed the problem fixed. We'll probably test out 4.2 in the next couple
months, but this one slipped past us as it didn't occur in our test cluster
until after we had upgraded production. In our experience it takes about
Hi Tom,
> We have been seeing this same behavior on a cluster that has been perfectly
> happy until we upgraded to the ubuntu vivid 3.19 kernel. We are in the
i can't recall when we gave 3.19 a shot but now that you say it... The
cluster was happy for >9 months with 3.16.
Did you try 4.2 or do y
We have been seeing this same behavior on a cluster that has been perfectly
happy until we upgraded to the ubuntu vivid 3.19 kernel. We are in the
process of "upgrading" back to the 3.16 kernel across our cluster as we've
not seen this behavior on that kernel for over 6 months and we're pretty
str
> On 08 Dec 2015, at 08:57, Benedikt Fraunhofer wrote:
>
> Hi Jan,
>
>> Doesn't look near the limit currently (but I suppose you rebooted it in the
>> meantime?).
>
> the box this numbers came from has an uptime of 13 days
> so it's one of the boxes that did survive yesterdays half-cluster-wi
Hi Jan,
> Doesn't look near the limit currently (but I suppose you rebooted it in the
> meantime?).
the box this numbers came from has an uptime of 13 days
so it's one of the boxes that did survive yesterdays half-cluster-wide-reboot.
> Did iostat say anything about the drives? (btw dm-1 and dm
Doesn't look near the limit currently (but I suppose you rebooted it in the
meantime?).
Did iostat say anything about the drives? (btw dm-1 and dm-6 are what? Is that
your data drives?) - were they overloaded really?
Jan
> On 08 Dec 2015, at 08:41, Benedikt Fraunhofer wrote:
>
> Hi Jan,
>
>
Hi Jan,
we had 65k for pid_max, which made
kernel.threads-max = 1030520.
or
kernel.threads-max = 256832
(looks like it depends on the number of cpus?)
currently we've
root@ceph1-store209:~# sysctl -a | grep -e thread -e pid
kernel.cad_pid = 1
kernel.core_uses_pid = 0
kernel.ns_last_pid = 60298
k
And how many pids do you have currently?
This should do it I think
# ps axH |wc -l
Jan
> On 08 Dec 2015, at 08:26, Benedikt Fraunhofer wrote:
>
> Hi Jan,
>
> we initially had to bump it once we had more than 12 osds
> per box. But it'll change that to the values you provided.
>
> Thx!
>
> Be
Hi Jan,
we initially had to bump it once we had more than 12 osds
per box. But it'll change that to the values you provided.
Thx!
Benedikt
2015-12-08 8:15 GMT+01:00 Jan Schermer :
> What is the setting of sysctl kernel.pid_max?
> You relly need to have this:
> kernel.pid_max = 4194304
> (I thi
What is the setting of sysctl kernel.pid_max?
You relly need to have this:
kernel.pid_max = 4194304
(I think it also sets this as well: kernel.threads-max = 4194304)
I think you are running out of processs IDs.
Jan
> On 08 Dec 2015, at 08:10, Benedikt Fraunhofer wrote:
>
> Hello Cephers,
>
>
Hello Cephers,
lately, our ceph-cluster started to show some weird behavior:
the osd boxes show a load of 5000-15000 before the osds get marked down.
Usually the box is fully usable, even "apt-get dist-upgrade" runs smoothly,
you can read and write to any disk, only things you can't do are strace
16 matches
Mail list logo