[ceph-users] crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

2020-05-28 Thread Harald Staub
This is again about our bad cluster, with too much objects, and the hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 GB usable). Now several OSDs do not come up any more. Typical error message: /build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED ceph_assert(h->fil

[ceph-users] Repo for Nautilus packages for CentOS8

2020-05-28 Thread Massimo Sgaravatto
Hi What is the repo supposed to be used for Nautilus packages for CentOS8 ? The documentation says to use https://download.ceph.com/rpm-nautilus, but https://download.ceph.com/rpm-nautilus/el8 is empty I see some packages in: http://mirror.centos.org/centos/8/storage/x86_64/ceph-nautilus/Packa

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-28 Thread KervyN
Hi Eugene, no. The mgr services are located on our mon servers. This happens when I reboot any OSD node. Cheers Boris > > Am 29.05.2020 um 08:40 schrieb Eugen Block : > > Is the MGR service colocated on that OSD node? > > > Zitat von Boris Behrens : > >> Dear people on this mailing list

[ceph-users] Re: MAX AVAIL goes up when I reboot an OSD node

2020-05-28 Thread Eugen Block
Is the MGR service colocated on that OSD node? Zitat von Boris Behrens : Dear people on this mailing list, I've got the "problem" that our MAX AVAIL value increases by about 5-10 TB when I reboot a whole OSD node. After the reboot the value goes back to normal. I would love to know WHY. Un

[ceph-users] Re: Recover UUID from a partition

2020-05-28 Thread Eugen Block
If you're running with ceph-volume you could try 'ceph-volume lvm trigger/activate {vg}/{lv}' and see if that helps. Zitat von "Szabo, Istvan (Agoda)" : Hi, Is there a way to recover UUID from the partition? Someone mapped in fstab to /sd* not UUID and all the metadata is gone. The data is

[ceph-users] Re: The sufficient OSD capabilities to enable write access on cephfs

2020-05-28 Thread Derrick Lin
I did some brute-force experiments and found the following setting works for me: caps osd = "allow rw pool=cephfs_data" I am not sure why ceph fs authorize command set in that way and for what purpose... Cheers On Fri, May 29, 2020 at 12:03 PM Derrick Lin wrote: > Hi guys, > > I have a Ceph

[ceph-users] Ceph Tech Talk: What's New In Octopus

2020-05-28 Thread Mike Perez
Hi everyone, We had a Ceph Tech Talk on May 28th at 17:00 UTC from Josh Durgin and Lenz Grimmer on a summary of the new features and enhancements with the new Octopus release. You can watch the recording here: https://www.youtube.com/watch?list=PLrBUGiINAakM36YJiTT0qYepZTVncFDdc&v=UGU-rGiEex

[ceph-users] Re: Cache pools at or near target size but no evict happen

2020-05-28 Thread icy chan
Hi Eugen, Sorry for the missing information. "cached-hdd-cache" is the overlay tier of "cached-hdd" and configured in "readproxy" mode. $ ceph osd dump | grep cached-hdd pool 24 'cached-hdd' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode warn

[ceph-users] The sufficient OSD capabilities to enable write access on cephfs

2020-05-28 Thread Derrick Lin
Hi guys, I have a Ceph Cluster up and running and cephfs created (all done by ceph-ansible). I following the guide to mount the volume on CentOS7 via FUSE. When I mount the volume as the default admin (client.admin), everything works fine just like normal file system. Then I created a new clien

[ceph-users] Re: Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-05-28 Thread Marco Pizzolo
Rebooting addressed On Thu, May 28, 2020 at 4:52 PM Marco Pizzolo wrote: > Hello, > > Hitting an issue with a new 15.2.2 deployment using cephadm. I am having > a problem creating encrypted, 2 osds per device OSDs (they are NVMe). > > After removing and bootstrapping the cluster again, i am

[ceph-users] PGs degraded after osd restart

2020-05-28 Thread Chad William Seys
You may have run into this bug: https://tracker.ceph.com/issues/44286 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [ceph-users]: Ceph Nautius not working after setting MTU 9000

2020-05-28 Thread Dave Hall
Hello. A few days ago I offered to share the notes I've compiled on network tuning.  Right now it's a Google Doc: https://docs.google.com/document/d/1nB5fzIeSgQF0ti_WN-tXhXAlDh8_f8XF9GhU7J1l00g/edit?usp=sharing I've set it up to allow comments and I'd be glad for questions and feedback.  If

[ceph-users] Re: 15.2.2 bluestore issue

2020-05-28 Thread Josh Durgin
Yeah, we'll make sure the container images are built before announcing it. On 5/28/20 1:30 PM, David Orman wrote: Due to the impact/severity of this issue, can we make sure the docker images are pushed simultaneously for those of us using cephadm/containers (with the last release, there was a s

[ceph-users] Re: No scrubbing during upmap balancing

2020-05-28 Thread Vytenis A
Forgot to mention the CEPH version we're running: Nautilus 14.2.9 On Fri, May 29, 2020 at 12:44 AM Vytenis A wrote: > > Hi list, > > We have balancer plugin in upmap mode running for a while now: > > health: HEALTH_OK > > pgs: > 1973 active+clean >194 active+remapped+backfilling >73

[ceph-users] PGs degraded after osd restart

2020-05-28 Thread Vytenis A
Hi cephists, We have a 10 node cluster running Nautilus 14.2.9 All objects are on EC pool. We have mgr balancer plugin in upmap mode doing it's rebalancing: health: HEALTH_OK pgs: 1985 active+clean 190 active+remapped+backfilling 65 active+remapp

[ceph-users] No scrubbing during upmap balancing

2020-05-28 Thread Vytenis A
Hi list, We have balancer plugin in upmap mode running for a while now: health: HEALTH_OK pgs: 1973 active+clean 194 active+remapped+backfilling 73 active+remapped+backfill_wait recovery: 588 MiB/s, 343 objects/s Our objects are stored on EC pool. We got an PG_NOT_DEEP_SCRUBBED

[ceph-users] MAX AVAIL goes up when I reboot an OSD node

2020-05-28 Thread Boris Behrens
Dear people on this mailing list, I've got the "problem" that our MAX AVAIL value increases by about 5-10 TB when I reboot a whole OSD node. After the reboot the value goes back to normal. I would love to know WHY. Under normal circumstances I would ignore this behavior, but because I am very n

[ceph-users] Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-05-28 Thread Marco Pizzolo
Hello, Hitting an issue with a new 15.2.2 deployment using cephadm. I am having a problem creating encrypted, 2 osds per device OSDs (they are NVMe). After removing and bootstrapping the cluster again, i am unable to create OSDs as they're locked. sgdisk, wipefs, zap all fail to leave the drive

[ceph-users] Re: 15.2.2 bluestore issue

2020-05-28 Thread David Orman
Due to the impact/severity of this issue, can we make sure the docker images are pushed simultaneously for those of us using cephadm/containers (with the last release, there was a significant delay)? I'm glad the tempfix is being put into place in short-order, thank you for the expedient turnaround

[ceph-users] Re: 15.2.2 bluestore issue

2020-05-28 Thread Josh Durgin
Hi Paul, we're planning to release 15.2.3 with the workaround [0] tomorrow, so folks don't have to worry as we work on a more complete fix. Josh [0] https://github.com/ceph/ceph/pull/35293 On 5/27/20 6:27 AM, Paul Emmerich wrote: Hi, since this bug may lead to data loss when several OSDs cras

[ceph-users] Re: Reducing RAM usage on production MDS

2020-05-28 Thread Patrick Donnelly
On Wed, May 27, 2020 at 10:09 PM Dylan McCulloch wrote: > > Hi all, > > The single active MDS on one of our Ceph clusters is close to running out of > RAM. > > MDS total system RAM = 528GB > MDS current free system RAM = 4GB > mds_cache_memory_limit = 451GB > current mds cache usage = 426GB This

[ceph-users] bluestore - rocksdb level sizes

2020-05-28 Thread Frank R
If I remember correctly, being able to configure the rocksdb level sizes was targeted for Octopus. I was wondering if this feature ever made it into the code as it would be useful when you want to use a drive smaller than 300G for the WAL/DB. ___ ceph-us

[ceph-users] Re: snapshot-based mirroring explanation in docs

2020-05-28 Thread Jason Dillaman
On Thu, May 28, 2020 at 8:44 AM Hans van den Bogert wrote: > Hi list, > > When reading the documentation for the new way of mirroring [1], some > questions arose, especially with the following sentence: > > > Since this mode is not point-in-time consistent, the full snapshot > delta will need to

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-28 Thread EDH - Manuel Rios
Hi, ATS ( Automatic Transfer Switch ) Works well. We use in other services for single PSU server, they transfer the power from source B to UPS in nano secs , preventing all services from going down. You can get for 8A / 16A or 32 A , always monitorizable by SNMP , webinterface. -Mensaje

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-28 Thread Burkhard Linke
Hi, On 5/28/20 2:18 PM, Phil Regnauld wrote: Hi, in our production cluster, we have the following setup *snipsnap* Buy some power transfer switches. You can connect those to the two PDUs, and in case of a power failure on one PDUs they will still be able to use the second PDU. We only

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-28 Thread Hans van den Bogert
I would second that, there's no winning in this case for your requirements and single PSU nodes. If there were 3 feeds,  then yes; you could make an extra layer in your crushmap much like you would incorporate a rack topology in the crushmap. On 5/28/20 2:42 PM, Chris Palmer wrote: Immediate t

[ceph-users] snapshot-based mirroring explanation in docs

2020-05-28 Thread Hans van den Bogert
Hi list, When reading the documentation for the new way of mirroring [1], some questions arose, especially with the following sentence: > Since this mode is not point-in-time consistent, the full snapshot delta will need to be synced prior to use during a failover scenario. 1) I'm not sure

[ceph-users] Re: CEPH failure domain - power considerations

2020-05-28 Thread Chris Palmer
Immediate thought: Forget about crush maps, osds, etc. If you lose half the nodes (when one power rail fails) your MONs will lose quorum. I don't see how you can win with that configuration... On 28/05/2020 13:18, Phil Regnauld wrote: Hi, in our production cluster, we have the following setup

[ceph-users] CEPH failure domain - power considerations

2020-05-28 Thread Phil Regnauld
Hi, in our production cluster, we have the following setup - 10 nodes - 3 drives / server (so far), mix of SSD and HDD (different pools) + NVMe - dual 10G in LACP, linked to two different switches (Cisco vPC) - OSDs, MONs and MGRs are colocated - A + B power feeds, 2 ATS

[ceph-users] Re: cephfs - modifying the ceph.file.layout of existing files

2020-05-28 Thread Andrej Filipcic
Thanks a lot, I will give it a try, I plan to use that in a very controlled environment anyway. Best regards, Andrej On 2020-05-28 12:21, Luis Henriques wrote: Andrej Filipcic writes: Hi, I have two directories, cache_fast and cache_slow, and I would like to move the least used files fr

[ceph-users] Re: cephfs file layouts, empty objects in first data pool

2020-05-28 Thread Eugen Block
Hi, I've been waiting to respond to this thread for a couple of months now, I wanted to have the latest Nautilus updates installed because we had a lower version than the OP. I tried to reproduce it both with 14.2.3 and now with 14.2.9 and a brand new cephfs (lab environment, newly create

[ceph-users] Re: Cache pools at or near target size but no evict happen

2020-05-28 Thread Eugen Block
I don't see a cache_mode enabled on the pool, did you set one? Zitat von icy chan : Hi, I had configured a cache tier with max object counts 500k. But no evict happens when the object counts hit the configured maximum. Anyone experienced this issue? What should I do? $ ceph health detail HE

[ceph-users] Re: cephfs - modifying the ceph.file.layout of existing files

2020-05-28 Thread Luis Henriques
Andrej Filipcic writes: > Hi, > > I have two directories, cache_fast and cache_slow, and I would like to move > the  > least used files from fast to slow, aka, user side tiering. cache_fast is > pinned > to fast_data ssd pool, while cache_slow to hdd cephfs_data pool. > > $ getfattr -n ceph.dir

[ceph-users] Cache pools at or near target size but no evict happen

2020-05-28 Thread icy chan
Hi, I had configured a cache tier with max object counts 500k. But no evict happens when the object counts hit the configured maximum. Anyone experienced this issue? What should I do? $ ceph health detail HEALTH_WARN 1 cache pools at or near target size CACHE_POOL_NEAR_FULL 1 cache pools at or ne

[ceph-users] cephfs - modifying the ceph.file.layout of existing files

2020-05-28 Thread Andrej Filipcic
Hi, I have two directories, cache_fast and cache_slow, and I would like to move the  least used files from fast to slow, aka, user side tiering. cache_fast is pinned to fast_data ssd pool, while cache_slow to hdd cephfs_data pool. $ getfattr -n ceph.dir.layout /ceph/grid/cache_fast getfattr

[ceph-users] Re: Reducing RAM usage on production MDS

2020-05-28 Thread Dan van der Ster
Hi Dylan, It looks like you have 10GB of heap to be release -- try `ceph tell mds.$(hostname) heap release` to free that up. Otherwise, I've found it safe to incrementally inject decreased mds_cache_memory_limit's on prod mds's running v12.2.12. I'd start by decreasing the size just a few hundred