[ceph-users] CephFS optimizated for machine learning workload

2021-09-15 Thread Yan, Zheng
Following PRs are optimization we (Kuaishou) made for machine learning workloads (randomly read billions of small files) . [1] https://github.com/ceph/ceph/pull/39315 [2] https://github.com/ceph/ceph/pull/43126 [3] https://github.com/ceph/ceph/pull/43125 The first PR adds an option that disables

[ceph-users] Re: Health check failed: 1 pools ful

2021-09-15 Thread Eugen Block
Hi Frank, I think the snapshot rotation could be an explanation. Just a few days ago we had a host failure over night and some OSDs couldn't be rebalanced entirely because they were too full. Deleting a few (large) snapshots I created last week resolved the issue. If you monitored 'ceph osd

[ceph-users] Re: Questions about multiple zonegroups (was Problem with multi zonegroup configuration)

2021-09-15 Thread Boris Behrens
Ok, I think I found the basic problem. I used to talk to the endpoint that is also the Domain for the s3websites. After switching the domains around everything worked fine. :partyemote: I have wrote down what I think how things work together (wrote down here IYAI https://pastebin.com/6Gj9Q5hJ), an

[ceph-users] Re: OSD Service Advanced Specification db_slots

2021-09-15 Thread Eugen Block
Hi, db_slots is still not implemented: pacific:~ # ceph orch apply -i osd.yml --dry-run Error EINVAL: Failed to validate Drive Group: Filtering for is not supported Question 2: If db_slots still *doesn't* work, is there a coherent way to divide up a solid state DB drive for use by a bun

[ceph-users] Re: Health check failed: 1 pools ful

2021-09-15 Thread Frank Schilder
It happened again today: 2021-09-15 04:25:20.551098 [INF] Health check cleared: POOL_NEAR_FULL (was: 1 pools nearfull) 2021-09-15 04:19:01.512425 [INF] Health check cleared: POOL_FULL (was: 1 pools full) 2021-09-15 04:19:01.512389 [WRN] Health check failed: 1 pools nearfull (POOL_NEAR_FULL)

[ceph-users] Re: Docker & CEPH-CRASH

2021-09-15 Thread Eugen Block
Hi, ceph-crash services are standalone containers, they are not running inside other containers: host1:~ # ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID

[ceph-users] Re: Docker & CEPH-CRASH

2021-09-15 Thread Guilherme Geronimo
Got it: one instance per host is enough. In my case, I'm not using "ceph orch". We did it manually,  crafting one docker-compose.yml per host. The question is: Is it possible to run a "crash instance" per host or the solution oblige me to adopt the cephadm solution? Thanks! []'s Arthur On 1

[ceph-users] OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Davíð Steinn Geirsson
Hi, I rebooted one of my ceph nodes this morning after OS updates. No ceph packages were upgraded. After reboot, 4 out of 12 OSDs on this host refuse to start, giving errors: ``` Sep 15 14:59:25 janky ceph-osd[12384]: 2021-09-15T14:59:24.994+ 7f418196ef00 -1 bluestore(/var/lib/ceph/osd/ceph-0

[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Davíð Steinn Geirsson
Just realised the debug paste I sent was for OSD 5 but the other info is for OSD 0. They are both having the same issue, but for completeness sake here is the debug output from OSD 0: http://paste.debian.net/1211873/ All daemons in the cluster are running ceph pacific 16.2.5. Regards, Davíð On W

[ceph-users] Re: Smarter DB disk replacement

2021-09-15 Thread Ján Senko
M.2 was not designed for hot swap, and Icydock's solution is a bit outside specification. I really like the new Supermicro box (610P) that has 12 spinning disks and then 6 NVMs. 2 of them in 2.5"x7mm format and 4 of them in the new E1.S format. E1.S is practically next gen hot plug M.2 Ján Senk

[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Stefan Kooman
On 9/15/21 18:06, Davíð Steinn Geirsson wrote: Just realised the debug paste I sent was for OSD 5 but the other info is for OSD 0. They are both having the same issue, but for completeness sake here is the debug output from OSD 0: http://paste.debian.net/1211873/ All daemons in the cluster are r

[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Davíð Steinn Geirsson
Hi, On Wed, Sep 15, 2021 at 08:39:11PM +0200, Stefan Kooman wrote: > On 9/15/21 18:06, Davíð Steinn Geirsson wrote: > > Just realised the debug paste I sent was for OSD 5 but the other info is for > > OSD 0. They are both having the same issue, but for completeness sake here > > is the debug outpu

[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Stefan Kooman
On 9/15/21 21:02, Davíð Steinn Geirsson wrote: Hi, On Wed, Sep 15, 2021 at 08:39:11PM +0200, Stefan Kooman wrote: On 9/15/21 18:06, Davíð Steinn Geirsson wrote: Just realised the debug paste I sent was for OSD 5 but the other info is for OSD 0. They are both having the same issue, but for comp

[ceph-users] Re: OSDs unable to mount BlueFS after reboot

2021-09-15 Thread Davíð Steinn Geirsson
On Wed, Sep 15, 2021 at 09:16:17PM +0200, Stefan Kooman wrote: > On 9/15/21 21:02, Davíð Steinn Geirsson wrote: > > Hi, > > > > On Wed, Sep 15, 2021 at 08:39:11PM +0200, Stefan Kooman wrote: > > > On 9/15/21 18:06, Davíð Steinn Geirsson wrote: > > > > Just realised the debug paste I sent was for O

[ceph-users] Re: BLUEFS_SPILLOVER

2021-09-15 Thread Janne Johansson
Den tors 16 sep. 2021 kl 06:28 skrev Szabo, Istvan (Agoda) : > > Hi, > > Something weird happening, I have on 1 nvme drive and 3x SSD's are using for > wal and db. > The LVM is 596GB but in the health detail is says x GiB spilled over to slow > device, however just 317 GB use only :/ > > [WRN] BL

[ceph-users] Re: Docker & CEPH-CRASH

2021-09-15 Thread Eugen Block
I haven't tried it myself but it would probably work to run the crash services apart from cephadm, maybe someone else has a similar setup. Since those are not critical services you can try it without impacting the rest of the cluster. But again, I haven't tried it this way. Zitat von Guilh