[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread huxia...@horebdata.cn
Indeed,scaling up PGs in an OSD may be needed for larger HDDs. Increasing the number of PGs by 5 fold or 10 fold would have adverse impact of OSD peering. What is the practical limits on the number of PGs per OSD with default setting, OR should we tuning some Ceph default setting for

[ceph-users] Re: Some confusion around PG, OSD and balancing issue

2021-03-13 Thread Darrin Hodges
HI Matthew, the results of the commands are: ceph df detail --- RAW STORAGE --- CLASS  SIZE AVAIL   USED RAW USED  %RAW USED hdd    190 TiB  61 TiB  129 TiB   129 TiB  67.70 TOTAL  190 TiB  61 TiB  129 TiB   129 TiB  67.70   --- POOLS --- POOL   ID  PGS 

[ceph-users] Safe to remove osd or not? Which statement is correct?

2021-03-13 Thread Szabo, Istvan (Agoda)
Hi Gents, There is a cluster with 14 hosts in this state: https://i.ibb.co/HPF3Pdr/6-ACB2-C5-B-6-B54-476-B-835-D-227-E9-BFB1247.jpg There is a host based crush rule ec 3:1 and there are 3 hosts where are osds down. Unfortunately there are pools with 3 replicas also which is host based. 2

[ceph-users] Re: OSD id 241 != my id 248: conversion from "ceph-disk" to "ceph-volume simple" destroys OSDs

2021-03-13 Thread Frank Schilder
Sorry if anyone gets this twice. It didn't make it to the list. -- Frank From: Frank Schilder Sent: 12 March 2021 13:48 To: Chris Dunlop Cc: ceph-users@ceph.io; Wissem MIMOUNA Subject: Re: [ceph-users] OSD id 241 != my id 248: conversion from "ceph-disk"

[ceph-users] cephadm and ha service for rgw

2021-03-13 Thread Seba chanel
Hi everyone, I try to configure HA service for rgw with cephadm. I have 2 rgw on cnrgw1 et cnrgw2 for the same pool. i use a virtual IP address 192.168.0.15 cnrgwha and the config from https://docs.ceph.com/en/latest/cephadm/rgw/#high-availability-service-for-rgw # from root@cnrgw1 [root@cnrgw1

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Martin Verges
> So perhaps we'll need to change the OSD to allow for 500 or 1000 PGs We had a support case last year where we where forced to set the OSD limit to >4000 for a few days, and had more then 4k active PGs on that single OSD. You can do that, however it is quite uncommon. -- Martin Verges Managing

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Dan van der Ster
OK Btw, you might need to fail to a new mgr... I'm not sure if the current active will read that new config. .. dan On Sat, Mar 13, 2021, 4:36 PM Boris Behrens wrote: > Hi, > > ok thanks. I just changed the value and rewighted everything back to 1. > Now I let it sync the weekend and check

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
Hi, ok thanks. I just changed the value and rewighted everything back to 1. Now I let it sync the weekend and check how it will be on monday. We tried to have the systems total storage balanced as possible. New systems will be with 8TB disks but for the exiting ones we added 16TB to offset the

[ceph-users] Re: Removing secondary data pool from mds

2021-03-13 Thread Frank Schilder
Dear Michael, good to hear that it is over. I'm a bit surprised and also worried that you lost data again. Was the cluster rebalancing when the restarts happened? I had OSDs restart all over the place due to bugs, OOM or admin accidents and never lost anything (except data access for a

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Dan van der Ster
Thanks. Decreasing the max deviation to 2 or 1 should help in your case. This option controls when the balancer stops trying to move PGs around -- by default it stops when the deviation from the mean is 5. Yes this is too large IMO -- all of our clusters have this set to 1. And given that you

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
Hi Dan, upmap_max_deviation is default (5) in our cluster. Is 1 the recommended deviation? I added the whole ceph osd df tree, (I need to remove some OSDs and readd them as bluestore with SSD, so 69, 73 and 82 are a bit off now. I also reweighted to try to get the %USE mitigated). I will

[ceph-users] Re: should I increase the amount of PGs?

2021-03-13 Thread Dan van der Ster
No, increasing num PGs won't help substantially. Can you share the entire output of ceph osd df tree ? Did you already set ceph config set mgr mgr/balancer/upmap_max_deviation 1 ?? And I recommend debug_mgr 4/5 so you can see some basic upmap balancer logging. .. Dan On Sat, Mar 13,

[ceph-users] should I increase the amount of PGs?

2021-03-13 Thread Boris Behrens
Hello people, I am still struggeling with the balancer (https://www.mail-archive.com/ceph-users@ceph.io/msg09124.html) Now I've read some more and might think that I do not have enough PGs. Currently I have 84OSDs and 1024PGs for the main pool (3008 total). I have the autoscaler enabled, but I

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Dan van der Ster
On Fri, Mar 12, 2021 at 6:35 PM Robert Sander wrote: > > Am 12.03.21 um 18:30 schrieb huxia...@horebdata.cn: > > > Any other aspects on the limits of bigger capacity hard disk drives? > > Recovery will take longer increasing the risk of another failure in the > same time. > Another limitation is

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Martin Verges
If you have a small cluster, without host redundancy, you are still able to configure this in Ceph to be handled correctly by adding a drive failure domain between host and OSD level. So yes you need to change more then just failure-domain=OSD, as this would be a problem. However it is absolutely

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Martin Verges
> failure-domain=host yes (or rack/room/datacenter/..), for regular clusters it's therefore absolute no problem as you correctly assumed. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.ver...@croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h,

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Marc
> Well, if you run with failure-domain=host, then if it says "I have 8 > 14TB drives and one failed" or "I have 16 7TB drives and two failed" > isn't going to matter much in terms of recovery, is it? > It would mostly matter for failure-domain=OSD, otherwise it seems about > equal. Yes, but

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Janne Johansson
Den lör 13 mars 2021 kl 12:56 skrev Marc : > > A good mix of size and performance is the Seagate 2X14 MACH.2 Dual > > Actor 14TB HDD. > > This drive reports as 2x 7TB individual block devices and you install > > a OSD on each. > > My first thought was, wow quite nice this dual exposes itself as

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Marc
> > A good mix of size and performance is the Seagate 2X14 MACH.2 Dual > Actor 14TB HDD. > This drive reports as 2x 7TB individual block devices and you install > a OSD on each. My first thought was, wow quite nice this dual exposes itself as two drives. I was always under the impression that

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread huxia...@horebdata.cn
Thanks a lot for the insightful comments. huxia...@horebdata.cn From: Janne Johansson Date: 2021-03-13 11:36 To: huxia...@horebdata.cn CC: ceph-users Subject: Re: [ceph-users] How big an OSD disk could be? Den fre 12 mars 2021 kl 18:10 skrev huxia...@horebdata.cn : > Dear cephers, > Just

[ceph-users] Re: How big an OSD disk could be?

2021-03-13 Thread Janne Johansson
Den fre 12 mars 2021 kl 18:10 skrev huxia...@horebdata.cn : > Dear cephers, > Just wonder how big an OSD disk could be? Currently the biggest HDD has a > capacity of 18TB or 20TB. It is suitable for an OSD still? > Is there a limitation of the capacity of a single OSD? Can it be 30TB , 50TB > or

[ceph-users] Re: Location of Crush Map and CEPH metadata

2021-03-13 Thread Anthony D'Atri
As Nathan describes, this information is maintained in the database on mon / monitor nodes. One always runs multiple mons in production, at least 3 and commonly 5. Each has a full copy of everything, so that the loss of a node does not lose data or impact operation. BTW, it’s Ceph not CEPH