[ceph-users] Re: Billions of objects upload with bluefs spillover cause osds down?

2021-09-27 Thread Janne Johansson
Den tis 28 sep. 2021 kl 08:15 skrev Szabo, Istvan (Agoda) : > Regarding point 2, how can it spillover if I wouldn’t use db device just > block. It can't but it ACTS like if you had 100% spillover. The act of spilling over is a symptom of db sharing device with data. If you have no dedicated devi

[ceph-users] Re: Billions of objects upload with bluefs spillover cause osds down?

2021-09-27 Thread Janne Johansson
> 2. > If I remove the db device from the nvme with the ceph-objectstore-tool and > keep it with the block, would it be an issue still? I guess if stay together > cannot spillover anywhere. > I guess need to compact the spilledover disks before remove db. If you don't use a WAL/DB device, you ar

[ceph-users] Re: Tool to cancel pending backfills

2021-09-27 Thread Josh Baergen
> I have a question regarding the last step. It seems to me that the ceph > balancer is not able to remove the upmaps > created by pgremapper, but instead creates new upmaps to balance the pgs > among osds. The balancer will prefer to remove existing upmaps[1], but it's not guaranteed. The upmap

[ceph-users] Re: MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi Duncan. I've tried with a couple of different libraries. brew install osxfuse brew install macfuse brew install fuse But none of them helped with installation or connection for the machine that was able to build the client. Thank you for helping. Best regards Daniel On Mon, Sep 27, 2021 at

[ceph-users] Re: MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi Duncan. Great suggestion. Thank you for the link. I've run it on both the M1 BigSur mac and it did not compile because it didn't have a FUSE:FUSE target whatever that meant. === Last 15 lines from /Users/danielp/Library/Logs/H

[ceph-users] Cephadm set rgw SSL port

2021-09-27 Thread Sergei Genchev
Hi, I need to deploy RGW with SSL and was looking at the page https://docs.ceph.com/en/pacific/cephadm/rgw/ I want rados gateway to listen on a custom port. My yaml file looks like this: service_type: rgw service_id: connectTest placement: hosts: - cv1xta-conctcephradosgw000 - cv1xta-conctce

[ceph-users] Re: Tool to cancel pending backfills

2021-09-27 Thread Peter Lieven
Am 26.09.21 um 19:08 schrieb Alexandre Marangone: > Thanks for the feedback Alex! If you have any issue or ideas for > improvements please do submit them on the GH repo: > https://github.com/digitalocean/pgremapper/ > > Last Thursday I did a Ceph at DO tech talk, I talked about how we use > pgremap

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-27 Thread Adam King
Unfortunately, I can't think of a workaround that doesn't involve a code change. I've created a tracker (https://tracker.ceph.com/issues/52745) and am working towards a fix for this, but I'm not sure how to deal with it using the current 16.2.6 image. Maybe others will have some ideas. On Mon, Sep

[ceph-users] MacOS Ceph Filesystem client

2021-09-27 Thread Daniel Persson
Hi I'm running some tests on a couple of Mac Mini machines. One of them is an M1 with BigSur, and the other one is a regular Intel Mac with Catalina. I've tried to build Ceph Nautilus, Octopus, and Pacific multiple times with different parameters and added many dependencies to the systems but hav

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Edward R Huyer
I also just ran into what seems to be the same problem Chris did. Despite all indicators visible to me saying my NVMe drive is non-rotational (including /sys/block/nvme0n1/queue/rotational ), the Orchestrator would not touch it until I specified it by model. -Original Message- From: Eu

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Stefan Kooman
On 9/24/21 07:59, Szabo, Istvan (Agoda) wrote: Hi, Wonder how you guys do it due to we will always have limitation on the network bandwidth of the loadbalancer. You might get rid of the load-balancer entirely, and use DNS based load-balancing. PowerDNS has a powerful feature for this [1]. Th

[ceph-users] is it possible to remove the db+wal from an external device (nvme)

2021-09-27 Thread Szabo, Istvan (Agoda)
Hi, Seems like in our config the nvme device as a wal+db in front of the ssd slowing down the ssds osds. I'd like to avoid to rebuild all the osd-, is there a way somehow migrate to the "slower device" the wal+db without reinstall? Ty ___ ceph-users

[ceph-users] Re: S3 Bucket Notification requirement

2021-09-27 Thread Sanjeev Jha
Hi Yuval, I have changed the sns signature version as suggested.. I tried creating simple sns topic even without any attributes, but it does not seem to be working and getting below error: [ansibleuser@ceprgw01 ~]$ aws --profile test sns create-topic --region=poc --name=mytopic --endpoint-url

[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-27 Thread Edward R Huyer
I somehow missed your reply earlier, but yes, I think that's useful. I know how big my NVMe drives are (1920GB), and the maximum number of spinning disks a given node is going to have (12), so chopping up the NVMe drives shouldn't be hard. It's a bit suboptimal since all OSDs get the same DB s

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Szabo, Istvan (Agoda)
Hi, How many RGW are you using for this huge cluster? Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com --- -Original Message- From: Svante Kar

[ceph-users] Re: 16.2.6 CEPHADM_REFRESH_FAILED New Cluster

2021-09-27 Thread Marco Pizzolo
Good morning Adam, Ceph users, Is there something we can do to extend the acceptable response size? Trying to understand if there is some viable workaround that we can implement. Thanks, Marco On Fri, Sep 24, 2021 at 2:59 PM Marco Pizzolo wrote: > Hi Adam, > > I really appreciate your time rev

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Eugen Block
Hi, just remove the wal_devices part from your specs, ceph-volume will automatically put the wal onto the same SSD. If you want to use both SSDs for DB I would also remove the "limit" filter so ceph-volume can use both SSDs for block.db. You don't seem to have more than those two SSDs per

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Chris
The lines I cited as examples of cephadm misinterpreting rotational states were pulled from the mgr container stderr, acquired via docker logs . Your comments on my deployment strategy are very helpful--I figured (incorrectly?) that having the db & wal on faster drives would benefit the overall th

[ceph-users] Re: change osdmap first_committed

2021-09-27 Thread Seena Fallah
I found a way to set it using: ceph-kvstore-tool rocksdb . set osdmap first_committed ver 12261 But is that safe to do it? =) On Mon, Sep 27, 2021 at 5:06 PM Seena Fallah wrote: > Hi, > > I've lost all my mon dbs and after rebuilding it using OSDs the osdmap > first_committed is set to 1 but my

[ceph-users] change osdmap first_committed

2021-09-27 Thread Seena Fallah
Hi, I've lost all my mon dbs and after rebuilding it using OSDs the osdmap first_committed is set to 1 but my osdmaps commits start from 12261 (get a list of osdmaps from mon db) and now when mon wants to trim osdmaps it will fail because it can't find osdmap 1. Is there a way to change osdmap fi

[ceph-users] Re: Problem with adopting 15.2.14 cluster with cephadm on CentOS 7

2021-09-27 Thread Manuel Holtgrewe
Hi, it turns out that I was a bit confused. I already had my cluster upgrade to v15/octopus and was now incorrectly using the image for v14/nautilus which of course don't work as downgrades are not expected. Please ignore my email. Cheers, Manuel On Mon, Sep 27, 2021 at 11:52 AM Manuel Holtgrew

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-27 Thread Eugen Block
Hi, I think 'ceph-bluestore-tool bluefs-bdev-migrate' could be of use here. I haven't tried it in a production environment yet, only in virtual labs. Regards, Eugen Zitat von "Szabo, Istvan (Agoda)" : Hi, Seems like in our config the nvme device as a wal+db in front of the ssd slowi

[ceph-users] Re: ceph_add_cap: couldn't find snap realm 110

2021-09-27 Thread Eugen Block
Thanks, Luís! Leap 15.1 is EOL and the kernel isn't receiving any further fixes. Yes, we're aware of that. We're still discussing if we should stay on RPMs or switch to cephadm and containers. Thanks again! Eugen Zitat von Luis Henriques : On Mon, Sep 27, 2021 at 07:04:39AM +, Euge

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Svante Karlsson
I forgot, we actually have two per node, we run a separate cold storage pool as well with erasure encoded data and each pool needs it's own rgw (I think). But they are not load balanced or anything. Den mån 27 sep. 2021 kl 13:18 skrev Svante Karlsson : > > One per kubernetes node (currently 10x 64

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Svante Karlsson
One per kubernetes node (currently 10x 64 core nodes). plus one extra outside for non kubernetes stuff. This works for the jobs that we currently run on each node. If they would overload the on-node-rgw we could add more rgw's on the node and put a local haproxy before. same principle. Den mån 27

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Svante Karlsson
Hi Robert, I think it was because kubernetes jobs, in docker themselves have an own definitions of 127.0.0.1 (ie inside the container). Den mån 27 sep. 2021 kl 12:50 skrev Robert Sander : > > Am 27.09.21 um 12:44 schrieb Svante Karlsson: > > We added > > a common extra ip address in iptables with

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Robert Sander
Am 27.09.21 um 12:44 schrieb Svante Karlsson: We added a common extra ip address in iptables with a rule to map that to localhost. Finally each kubernetes job uses this common ip to the "local" rgw server. This way we skip two hops of network traffic to the real gateway and this scales with the

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Svante Karlsson
Hi Szabo, we have a 7PB cluster that only servers s3 content for read heavy jobs running on a dedicated kubernetes cluster, all connections are 100G . We overloaded first rgw gateways, and then the loadbalancers. The hackish solution we came up with is to add each kubernetetes node as ceph members

[ceph-users] Re: ceph_add_cap: couldn't find snap realm 110

2021-09-27 Thread Luis Henriques
On Mon, Sep 27, 2021 at 07:04:39AM +, Eugen Block wrote: > Good morning, > > could anyone tell me if the patch [1] for this tracker issue [2] is already > available in any new (open)SUSE kernel (maybe Leap 15.3)? We seem to be > hitting [2] on openSUSE Leap 15.1 and if there's a chance to fix

[ceph-users] Re: Change max backfills

2021-09-27 Thread Stefan Kooman
On 9/24/21 23:51, David Orman wrote: With recent releases, 'ceph config' is probably a better option; do keep in mind this sets things cluster-wide. If you're just wanting to target specific daemons, then tell may be better for your use case. You can be specific with ceph config too. For exampl

[ceph-users] Re: Problem with adopting 15.2.14 cluster with cephadm on CentOS 7

2021-09-27 Thread Eugen Block
Hi, the logs states: 2021-09-27 10:47:20,415 DEBUG Could not locate podman: podman not found Have you verified if it's installed? Zitat von Manuel Holtgrewe : Hi, I have a 15.2.14 ceph cluster running on an up to date CentOS 7 that I want to adopt to cephadm. I'm trying to follow this: h

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-09-27 Thread Eugen Block
Hi, I read your first email again and noticed that ceph-volume already identifies the drives sdr and sds as non-rotational and as available. That would also explain the empty rejected_reasons field because they are not rejected at (this stage?). Where do you read that information that one

[ceph-users] Re: Error ceph-mgr on fedora 36

2021-09-27 Thread Sebastian Wagner
looks like you should create a tracker issue for this. https://tracker.ceph.com/projects/mgr/issues/new Am 18.09.21 um 14:34 schrieb Igor Savlook: OS: Fedora 36 (rawhide) Ceph: 16.2.6 Python: 3.10 When start ceph-mgr he is try load core pyth

[ceph-users] Re: RGW memory consumption

2021-09-27 Thread Martin Traxl
Hi there, a quick update on this issue. We finally found the memory leak to be in the STS part of Ceph. In the meantime there already was a bug ticket opened https://tracker.ceph.com/issues/52290 There is also a pull request fixing this issue, which we applied on our test environment and coul

[ceph-users] Problem with adopting 15.2.14 cluster with cephadm on CentOS 7

2021-09-27 Thread Manuel Holtgrewe
Hi, I have a 15.2.14 ceph cluster running on an up to date CentOS 7 that I want to adopt to cephadm. I'm trying to follow this: https://docs.ceph.com/en/pacific/cephadm/adoption/ However, I am failing to adopt the monitors. I've tried the process a couple of time with rolling back everything thr

[ceph-users] Re: Remoto 1.1.4 in Ceph 16.2.6 containers

2021-09-27 Thread Sebastian Wagner
Thank you David! Am 24.09.21 um 00:41 schrieb David Galloway: I just repushed the 16.2.6 container with remoto 1.2.1 in it. On 9/22/21 4:19 PM, David Orman wrote: https://bodhi.fedoraproject.org/updates/FEDORA-EPEL-2021-4b2736a28c ^^ if people want to test and provide feedback for a potential

[ceph-users] Re: How you loadbalance your rgw endpoints?

2021-09-27 Thread Sebastian Wagner
Hi Szabo, I think you can have a look at https://docs.ceph.com/en/latest/cephadm/rgw/#high-availability-service-for-rgw even if you don't deploy ceph using cephadm. Am 24.09.21 um 07:59 schrieb Szabo, Istvan (Ag

[ceph-users] Re: Restore OSD disks damaged by deployment misconfiguration

2021-09-27 Thread Sebastian Wagner
Hi Phil, Am 27.09.21 um 10:06 schrieb Phil Merricks: Hey folks, A recovery scenario I'm looking at right now is this: 1: In a clean 3-node Ceph cluster (pacific, deployed with cephadm), the OS Disk is lost from all nodes 2: Trying to be helpful, a self-healing deployment system reinstalls the

[ceph-users] Restore OSD disks damaged by deployment misconfiguration

2021-09-27 Thread Phil Merricks
Hey folks, A recovery scenario I'm looking at right now is this: 1: In a clean 3-node Ceph cluster (pacific, deployed with cephadm), the OS Disk is lost from all nodes 2: Trying to be helpful, a self-healing deployment system reinstalls the OS on each node, and rebuilds the ceph services 3: Somew

[ceph-users] ceph_add_cap: couldn't find snap realm 110

2021-09-27 Thread Eugen Block
Good morning, could anyone tell me if the patch [1] for this tracker issue [2] is already available in any new (open)SUSE kernel (maybe Leap 15.3)? We seem to be hitting [2] on openSUSE Leap 15.1 and if there's a chance to fix it by upgrading the kernel it would be great news! Thanks! Eug