[ceph-users] Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

2021-05-14 Thread Igor Fedotov
This looks similar to #50656 indeed. Hopefully will fix that next week. Thanks, Igor On 5/14/2021 9:09 PM, Neha Ojha wrote: On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus wrote: Hello, I will try to keep it sad and short :) :(PS sorry if this dublicate I tried post it from web also.

[ceph-users] Re: CRUSH rule for EC 6+2 on 6-node cluster

2021-05-14 Thread Bryan Stillwell
This works better than my solution. It allows the cluster to put more PGs on the systems with more space on them: # for pg in $(ceph pg ls-by-pool cephfs_data_ec62 -f json | jq -r '.pg_stats[].pgid'); do > echo $pg > for osd in $(ceph pg map $pg -f json | jq -r '.up[]'); do > ceph osd

[ceph-users] after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

2021-05-14 Thread Andrius Jurkus
Hello, I will try to keep it sad and short :) :(PS sorry if this dublicate I tried post it from web also. Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds. After data migration for few hours, 1 SSD failed, then another and another 1 by 1. Now I have cluster in pause and

[ceph-users] radosgw lost config during upgrade 14.2.16 -> 21

2021-05-14 Thread Jan Kasprzak
Hello, I have just upgraded my cluster from 14.2.16 to 14.2.21, and after the upgrade, radosgw was listening on the default port 7480 instead of the SSL port it used before the upgrade. It might be I mishandled "ceph config assimilate-conf" previously or forgot to restart radosgw after the

[ceph-users] Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

2021-05-14 Thread Neha Ojha
You are welcome! We still need to get to the bottom of this, I will update the tracker to make a note of this occurrence. Thanks, Neha On Fri, May 14, 2021 at 12:25 PM Andrius Jurkus wrote: > > Big thanks, Much appreciated help. > > It probably is same bug. > > bluestore_allocator = bitmap > >

[ceph-users] Re: "No space left on device" when deleting a file

2021-05-14 Thread Mark Schouten
On Tue, May 11, 2021 at 02:55:05PM +0200, Mark Schouten wrote: > On Tue, May 11, 2021 at 09:53:10AM +0200, Mark Schouten wrote: > > This helped me too. However, should I see num_strays decrease again? > > I'm running a `find -ls` over my CephFS tree.. > > This helps, the amount of stray files is

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-05-14 Thread Mark Schouten
On Mon, May 10, 2021 at 10:46:45PM +0200, Mark Schouten wrote: > I still have three active ranks. Do I simply restart two of the MDS'es > and force max_mds to one daemon, or is there a nicer way to move two > mds'es from active to standby? It seems (documentation was no longer available, so ik

[ceph-users] cephadm stalled after adjusting placement

2021-05-14 Thread Bryan Stillwell
I'm looking for help in figuring out why cephadm isn't making any progress after I told it to redeploy an mds daemon with: ceph orch daemon redeploy mds.cephfs.aladdin.kgokhr ceph/ceph:v15.2.12 The output from 'ceph -W cephadm' just says: 2021-05-14T16:24:46.628084+ mgr.paris.glbvov [INF]

[ceph-users] Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

2021-05-14 Thread Neha Ojha
On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus wrote: > > Hello, I will try to keep it sad and short :) :(PS sorry if this > dublicate I tried post it from web also. > > Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds. > After data migration for few hours, 1 SSD failed,

[ceph-users] ceph-Dokan on windows 10 not working after upgrade to pacific

2021-05-14 Thread Robert W. Eckert
Hi- I recently upgraded to pacific, and I am now getting an error connecting on my windows 10 machine: The error is the handle_auth_bad_method, I tried a few combinations of cephx,none on the monitors, but I keep getting the same error. The same config(With paths updated) and key ring works on

[ceph-users] Re: Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-14 Thread David Orman
We've created a PR to fix the root cause of this issue: https://github.com/alfredodeza/remoto/pull/63 Thank you, David On Mon, May 10, 2021 at 7:29 PM David Orman wrote: > > Hi Sage, > > We've got 2.0.27 installed. I restarted all the manager pods, just in > case, and I have the same behavior

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Yup, I just saw, should have 3GB :/ I will wait until the system goes back to normal and will increase. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Howmuch is yours? Mine is vm.min_free_kbytes = 90112. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- From:

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Is there anything that should be set just to be sure oom kill not happen? Or nothing? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Ok, seems like it doesn’t go below 600MB out of the 256GB, let’s wait until the pg_degradation healed. Did I do something wrong? I set in the global config the bluefs option, and restarted ceph.target on the osd node :/ ? Doe this need some special thing to apply? Istvan Szabo Senior

[ceph-users] Re: mon vanished after cephadm upgrade

2021-05-14 Thread Ashley Merrick
Hello,Is not listed under ceph -s, ceph-s reports no issues on the cluster.Is lised under orch ps and dashboard but reports "mon.sn-m01 sn-m01 stopped   114s ago  4M  -  "Let me know if anything else useful you would like before I try remove and redeploy.Thanks > On Fri May 14 2021

[ceph-users] Re: mon vanished after cephadm upgrade

2021-05-14 Thread Sebastian Wagner
Hi Ashley, is sn-m01 listed in `ceph -s`? Which hosts are listed in `ceph orch ps --daemon-type mon ? Otherwise, there are a two helpful commands now: * `cpeh orch daemon rm mon.sn-m01` to remove the mon * `ceph orch daemon start mon.sn-m01` to start it again Am 14.05.21 um 14:14 schrieb

[ceph-users] Limit memory of ceph-mgr

2021-05-14 Thread mabi
Hello, I just noticed on my small Octopus cluster that the ceph-mgr on a mgr/mon node uses 3.6GB of resident memory (RES) as you can see below from the top output: PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 2704 167 20 0 5030528 3.6g 35796 S

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
When this stop  ? When died … :D Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- From: Konstantin Shalygin Sent:

[ceph-users] mon vanished after cephadm upgrade

2021-05-14 Thread Ashley Merrick
I had a 3 mon CEPH cluster, after updating from 15.2.x to 16.2.x one of my mon's is showing as a stopped state in the Ceph Dashboard.And checking the cephadm logs on the server in question I can see "/usr/bin/docker: Error: No such object:

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Is it also normal if this buffered_ioturned on, it eats all the memory on the system? Hmmm. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
> On 14 May 2021, at 14:20, Szabo, Istvan (Agoda) > wrote: > > Howmuch is yours? Mine is vm.min_free_kbytes = 90112. I use 135168 > On 14 May 2021, at 14:31, Szabo, Istvan (Agoda) > wrote: > > Yup, I just saw, should have 3GB :/ I will wait until the system goes back to > normal and

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Irek Fasikhov
Hi. https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2/html/ceph_object_gateway_for_production/deploying_a_cluster#reserving_free_memory_for_osds пт, 14 мая 2021 г. в 14:21, Szabo, Istvan (Agoda) : > Howmuch is yours? Mine is vm.min_free_kbytes = 90112. > > Istvan Szabo >

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
I suggest to look into vm.min_free_kbytes kernel option, and increase it twice k > On 14 May 2021, at 13:45, Szabo, Istvan (Agoda) > wrote: > > Is there anything that should be set just to be sure oom kill not happen? Or > nothing? ___ ceph-users

[ceph-users] How to "out" a mon/mgr node with orchestrator

2021-05-14 Thread mabi
Hello, I need to re-install one node of my Octopus cluster (installed with cephadm) which is a mon/mgr node and did not find in the documentation how to do that with the new ceph orchestrator commands. So my question would be what are the "ceph orch" commands I need to run in order to "out"

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
It's enough, should be true now... k > On 14 May 2021, at 12:51, Szabo, Istvan (Agoda) > wrote: > > Did I do something wrong? > I set in the global config the bluefs option, and restarted ceph.target on > the osd node :/ ? > > Doe this need some special thing to apply?

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Hi, It is quite an older cluster, luminous 12.2.8. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message- From: Konstantin

[ceph-users] Re: RGW segmentation fault on Pacific 16.2.1 with multipart upload

2021-05-14 Thread Daniel Iwan
Precisely this! Thank you very much for the links. This caught me by surprise after upgrading my test cluster to 16.2.1. Looks like a regression in Pacific. Not included in 16.2.3 as far as I understand. Actually, with my attempt to solve the problem I upgraded to 16.2.3 at some point. RGW did

[ceph-users] Re: Zabbix module Octopus 15.2.3

2021-05-14 Thread Gerdriaan Mulder
Hi Reed, Gert, list, On 28/07/2020 23:42, Reed Dier wrote: > I'm going to resurrect this thread to throw my hat in the ring as I am > having this issue as well. Did not see any solutions to this thread (be it this one, or any thread more recently) so forgive me for re-resurrecting this thread

[ceph-users] bluefs_buffered_io turn to true

2021-05-14 Thread Szabo, Istvan (Agoda)
Hi, I had issue with the snaptrim after a hug amount of deleted data, it slows down the team operations due to the snaptrim and snaptrim_wait pgs. I've changed couple of things: debug_ms = 0/0 #default 0/5 osd_snap_trim_priority = 1 # default 5 osd_pg_max_concurrent_snap_trims = 1 # default 2

[ceph-users] Re: v14.2.21 Nautilus released

2021-05-14 Thread Ilya Dryomov
On Fri, May 14, 2021 at 8:20 AM Rainer Krienke wrote: > > Hello, > > has the "negative progress bug" also been fixed in 14.2.21? I cannot > find any info about this in the changelog? Unfortunately not -- this was a hotfix release driven by rgw and dashboard CVEs. Thanks, Ilya

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
Nope, kernel reserves enough memory to free on pressure, for example 36OSD 0.5TiB RAM host: totalusedfree shared buff/cache available Mem: 502G168G2.9G 18M331G472G Swap: 952M248M704M

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
> On 14 May 2021, at 10:50, Szabo, Istvan (Agoda) > wrote: > > Is it also normal if this buffered_ioturned on, it eats all the memory on the > system? Hmmm. > This is what actually do this option - eat all free memory as cached for bluefs speedups k

[ceph-users] Re: [Suspicious newsletter] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
I recommend to upgrade at least to 12.2.13, for luminous even .12 and .13 is significant difference in code. k > On 14 May 2021, at 09:22, Szabo, Istvan (Agoda) > wrote: > > It is quite an older cluster, luminous 12.2.8. ___ ceph-users mailing

[ceph-users] Re: v14.2.21 Nautilus released

2021-05-14 Thread Rainer Krienke
Hello, has the "negative progress bug" also been fixed in 14.2.21? I cannot find any info about this in the changelog? Thanks Rainer Am 14.05.21 um 02:01 schrieb David Galloway: This is a hotfix release addressing a number of security issues and regressions. We recommend all users update

[ceph-users] Re: bluefs_buffered_io turn to true

2021-05-14 Thread Konstantin Shalygin
Hi, This is not a normal, It's something different I think, like a crush changes on restart. This option will be enabled by default again in Nautilus next, so you can use it now with 14.2.19-20 k Sent from my iPhone > On 14 May 2021, at 08:21, Szabo, Istvan (Agoda) > wrote: > > Hi, > >