[ceph-users] Re: monitor connection error

2021-05-11 Thread Eugen Block
Hi, What is this error trying to tell me? TIA it tells you that the cluster is not reachable to the client, this can have various reasons. Can you show the output of your conf file? cat /etc/ceph/es-c1.conf Is the monitor service up running? I take it you don't use cephadm yet so it's

[ceph-users] Re: cephfs mds issues

2021-05-11 Thread Mazzystr
I jogged my own memory... My mons servers came back and didn't take the full ratio settings. ceph osd state reported osd's in full status (96%). That caused pools to report full. I run hotter than default settings. We buy disk when we hit 98% capacity not sooner. Arguing that policy is like yel

[ceph-users] cephfs mds issues

2021-05-11 Thread Mazzystr
I did a simple os update and reboot. Now mds is stuck in replay. I'm running octapus debug mds = 20 shows some pretty lame logs # tail -f ceph-mds.bridge.log 2021-05-11T18:24:04.859-0700 7f41314a1700 20 mds.0.cache upkeep thread waiting interval 1s 2021-05-11T18:24:05.860-0700 7f41314a1700 10 m

[ceph-users] monitor connection error

2021-05-11 Thread Tuffli, Chuck
Hi I'm new to ceph and have been following the Manual Deployment document [1]. The process seems to work correctly until step 18 ("Verify that the monitor is running"): [centos@cnode-01 ~]$ uname -a Linux cnode-01 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_

[ceph-users] DocuBetter Meeting -- 12 May 2021 1730 UTC

2021-05-11 Thread John Zachary Dover
There will be a DocuBetter Meeting held on 12 May 2021 at 1730 UTC. This is the monthly DocuBetter Meeting that is more convenient for European and North American Ceph contributors than the other meeting, which is convenient for people in Australia and Asia (and which is very rarely attended). I

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I actually WAS the amount of watchers... narf.. This is so embarissing.. Thanks a lot for all your input. Am Di., 11. Mai 2021 um 13:54 Uhr schrieb Boris Behrens : > I tried to debug it with --debug-ms=1. > Maybe someone could help me to wrap my head around it? > https://pastebin.com/LD9qrm3x >

[ceph-users] MonSession vs TCP connection

2021-05-11 Thread Jan Pekař - Imatic
Hi all, I would like to "pair" MonSession with TCP connection to get real process, which is using that session. I need it to identify processes with old ceph features. MonSession looks like MonSession(client.84324148 [..IP...]:0/3096235764 is open allow *, features 0x27018fb86aa42ada (jewel)

[ceph-users] Re: Which EC-code for 6 servers?

2021-05-11 Thread Szabo, Istvan (Agoda)
Ok, will stay with 2:2 or 3:2 so once 1 host goes down it can go to the other active host. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] Re: "No space left on device" when deleting a file

2021-05-11 Thread Mark Schouten
On Tue, May 11, 2021 at 09:53:10AM +0200, Mark Schouten wrote: > This helped me too. However, should I see num_strays decrease again? > I'm running a `find -ls` over my CephFS tree.. This helps, the amount of stray files is slowly decreasing. But given the number of files in the cluster, it'll ta

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-05-11 Thread Mark Schouten
On Tue, May 11, 2021 at 09:13:51AM +, Eugen Block wrote: > You can check the remaining active daemons if they have pinned subtrees: > > ceph daemon mds.daemon-a get subtrees | jq '.[] | [.dir.path, .auth_first]' This gives me output, a whole lot of lines. However, none of the directories are

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-05-11 Thread Konstantin Shalygin
> On 11 May 2021, at 14:24, Ilya Dryomov wrote: > > No, as mentioned above max_osds being greater is not a problem per se. > Having max_osds set to 1 when you only have a few dozen is going to > waste a lot of memory and network bandwidth, but if it is just slightly > bigger it's not someth

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
I tried to debug it with --debug-ms=1. Maybe someone could help me to wrap my head around it? https://pastebin.com/LD9qrm3x Am Di., 11. Mai 2021 um 11:17 Uhr schrieb Boris Behrens : > Good call. I just restarted the whole cluster, but the problem still > persists. > I don't think it is a proble

[ceph-users] Re: Which EC-code for 6 servers?

2021-05-11 Thread Frank Schilder
For performance reasons stay with powers of 2 for k. Any of 2+2 or 4+2 will work with your set-up and tolerate one (!) host failure with continued RW access and two host failures with RO (!) access. To tolerate 2 host failures with RW access, you need m=3, which is probably a bit much with 6 ho

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-05-11 Thread Ilya Dryomov
On Tue, May 11, 2021 at 10:50 AM Konstantin Shalygin wrote: > > Hi Ilya, > > On 3 May 2021, at 14:15, Ilya Dryomov wrote: > > I don't think empty directories matter at this point. You may not have > had 12 OSDs at any point in time, but the max_osd value appears to have > gotten bumped when you

[ceph-users] "radosgw-admin bucket radoslist" loops when a multipart upload is happening

2021-05-11 Thread Boris Behrens
Hi together, I still search for orphan objects and came across a strange bug: There is a huge multipart upload happening (around 4TB), and listing the rados objects in the bucket loops over the multipart upload. -- Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im groü

[ceph-users] one ODS out-down after upgrade to v16.2.3

2021-05-11 Thread Milosz Szewczak
Hi, I updated cluster as usual to newest version 16.2.3 from 16.2.1 ceph orch upgrade start --ceph-version 16.2.3 all went fine except on of OSD.54 stop working on host i see in logs ( after try to manual start ) ``` root@ceph-nvme01:/var/lib/ceph/77fc6eb4-7146-11eb-aa58-55847fcdb1f1/osd

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
Hi Amit, it is the same physical interface but different VLANs. I checked all IP adresses from all systems and everything is direct connected, without any gateway hops. Am Di., 11. Mai 2021 um 10:59 Uhr schrieb Amit Ghadge : > I hope you are using a single network interface for the public and clu

[ceph-users] Re: radosgw-admin user create takes a long time (with failed to distribute cache message)

2021-05-11 Thread Boris Behrens
Good call. I just restarted the whole cluster, but the problem still persists. I don't think it is a problem with the rados, but with the radosgw. But I still struggle to pin the issue. Am Di., 11. Mai 2021 um 10:45 Uhr schrieb Thomas Schneider < thomas.schneider-...@ruhr-uni-bochum.de>: > Hey a

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-05-11 Thread Eugen Block
You can check the remaining active daemons if they have pinned subtrees: ceph daemon mds.daemon-a get subtrees | jq '.[] | [.dir.path, .auth_first]' [ "/dir1/subdir1", 6 ] [ "", 0 ] [ "~mds6", 6 ] If there's no pinning enabled it should probably look like this: [ "", 0 ] [ "~m

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-05-11 Thread Mark Schouten
On Tue, May 11, 2021 at 08:47:26AM +, Eugen Block wrote: > I don't have a Luminous cluster at hand right now but setting max_mds to 1 > already should take care and stop MDS services. Do you have have pinning > enabled (subdirectories pinned to a specific MDS)? Not on this cluster, AFAIK. How

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-05-11 Thread Konstantin Shalygin
Hi Ilya, > On 3 May 2021, at 14:15, Ilya Dryomov wrote: > > I don't think empty directories matter at this point. You may not have > had 12 OSDs at any point in time, but the max_osd value appears to have > gotten bumped when you were replacing those disks. > > Note that max_osd being greater

[ceph-users] Re: Upgrade tips from Luminous to Nautilus?

2021-05-11 Thread Eugen Block
I don't have a Luminous cluster at hand right now but setting max_mds to 1 already should take care and stop MDS services. Do you have have pinning enabled (subdirectories pinned to a specific MDS)? Zitat von Mark Schouten : On Thu, Apr 29, 2021 at 10:58:15AM +0200, Mark Schouten wrote: W

[ceph-users] Re: "No space left on device" when deleting a file

2021-05-11 Thread Mark Schouten
[Resent because of incorrect ceph-users@ address..] On Tue, Mar 26, 2019 at 05:19:24PM +, Toby Darling wrote: > Hi Dan > > Thanks! > > ceph tell mds.ceph1 config set mds_bal_fragment_size_max 20 > > got us running again. This helped me too. However, should I see num_strays decrease a

[ceph-users] CephFS Subvolume Snapshot data corruption?

2021-05-11 Thread Andras Sali
Hi, We experienced a strange issue with a CephFS snapshot becoming partially unreadable. The snapshot was created about 2 months ago and we started a read operation from it. For a while everything was working fine with all directories accessible, however after some point clients (FUSE, v15.2.9) s