[ceph-users] Re: Moving OSDs to other hosts with cephadm

2022-01-13 Thread Sage Weil
https://github.com/ceph/ceph/pull/44228 I don't think this has landed in a pacific backport yet, but probably will soon! s On Tue, Jan 11, 2022 at 6:29 PM Bryan Stillwell wrote: > I recently had a server (named aladdin) that was part of my home cluster > die. It held 6 out of 32 OSDs, so to p

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
t; > best regards, > > samuel > > -- > huxia...@horebdata.cn > > > *From:* Sage Weil > *Date:* 2021-11-18 22:02 > *To:* Manuel Lausch ; ceph-users > > *Subject:* [ceph-users] Re: OSD spend too much time on "waiting for > rea

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil
ributing factors (in this case, at least 3). sage On Tue, Nov 16, 2021 at 9:42 AM Sage Weil wrote: > On Tue, Nov 16, 2021 at 8:30 AM Manuel Lausch > wrote: > >> Hi Sage, >> >> its still the same cluster we talked about. I only upgraded it from >> 16.2.5 to 16.2

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-15 Thread Sage Weil
load threshold = 1 > osd scrub priority = 1 > osd scrub thread suicide timeout = 0 > osd snap trim priority = 1 > osd snap trim sleep = 1.0 > public network = 10.88.7.0/24 > > [mon] > mon allow pool delete = false > mon health preluminous compat warning = false

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-11 Thread Sage Weil
rue; > + } >if (!HAVE_FEATURE(recovery_state.get_min_upacting_features(), > SERVER_OCTOPUS)) { > return true; > > > > Von: Peter Lieven > Gesendet: Mittwoch, 10. November 2021 11:37 > An: Manuel Lausch; Sage Weil > Cc: ceph-users@ceph.io > Be

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-08 Thread Sage Weil
is empty, expect of the epoch > and creation date. > That is concerning. Can you set debug_mon = 20 and capture a minute or so of logs? (Enough to include a few osdmap epochs.) You can use ceph-post-file to send it to us. Thanks! sage > > > Manuel > > > On Fri, 5 Nov

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-05 Thread Sage Weil
Yeah, I think two different things are going on here. The read leases were new, and I think the way that OSDs are marked down is the key things that affects that behavior. I'm a bit surprised that the _notify_mon option helps there, and will take a closer look at that Monday to make sure it's doin

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-04 Thread Sage Weil
process that follows needs to get OSDs' up_thru values to update and there is delay there. Thanks! sage On Thu, Nov 4, 2021 at 4:15 AM Manuel Lausch wrote: > On Tue, 2 Nov 2021 09:02:31 -0500 > Sage Weil wrote: > > > > > > Just to be clear, you should try &

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-02 Thread Sage Weil
memory serves, yes, but the notify_mon process can take more time than a peer OSD getting ECONNREFUSED. The combination above is the recommended combation (and the default). > These days I will test the fast_shutdown switch again and will share the > corresponding logs with you. > Thanks! s

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-01 Thread Sage Weil
Hi Manuel, I'm looking at the ticket for this issue ( https://tracker.ceph.com/issues/51463) and tried to reproduce. This was initially trivial to do with vstart (rados bench paused for many seconds afters stopping an osd) but it turns out that was because the vstart ceph.conf includes `osd_fast_

[ceph-users] A change in Ceph leadership...

2021-10-15 Thread Sage Weil
This fall I will be stepping back from a leadership role in the Ceph project. My primary focus during the next two months will be to work with developers and community members to ensure a smooth transition to a more formal system of governance for the Ceph project. My last day at Red Hat will be in

[ceph-users] ceph jobs

2021-09-08 Thread Sage Weil
Hi everyone, We set up a pad to collect Ceph-related job listings. If you're looking for a job, or have a Ceph-related position to advertise, take a look: https://pad.ceph.com/p/jobs sage ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscr

[ceph-users] Re: clients are using insecure global_id reclaim

2021-07-19 Thread Sage Weil
IIRC 'ceph health mute' is new in octopus (15.2.x). But disabling the mon_warn_on_insecure_global_id_reclaim_allowed setting should be sufficient to make the cluster be quiet... On Mon, Jul 19, 2021 at 10:53 AM Siegfried Höllrigl wrote: > > Hi ! > > We have upgraded our Ceph Cluster to version 1

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-28 Thread Sage Weil
On Fri, Jun 25, 2021 at 10:27 AM Nico Schottelius wrote: > Hey Sage, > > Sage Weil writes: > > Thank you for bringing this up. This is in fact a key reason why the > > orchestration abstraction works the way it does--to allow other > > runtime environments to be suppo

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Tue, Jun 22, 2021 at 1:25 PM Stefan Kooman wrote: > On 6/21/21 6:19 PM, Nico Schottelius wrote: > > And while we are at claiming "on a lot more platforms", you are at the > > same time EXCLUDING a lot of platforms by saying "Linux based > > container" (remember Ceph on FreeBSD? [0]). > > Indeed

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Tue, Jun 22, 2021 at 11:58 AM Martin Verges wrote: > > > There is no "should be", there is no one answer to that, other than 42. > Containers have been there before Docker, but Docker made them popular, > exactly for the same reason as why Ceph wants to use them: ship a known > good version (CI

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Sun, Jun 20, 2021 at 9:51 AM Marc wrote: > Remarks about your cephadm approach/design: > > 1. I am not interested in learning podman, rook or kubernetes. I am using > mesos which is also on my osd nodes to use the extra available memory and > cores. Furthermore your cephadm OC is limited to o

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-24 Thread Sage Weil
On Sat, Jun 19, 2021 at 3:43 PM Nico Schottelius wrote: > Good evening, > > as an operator running Ceph clusters based on Debian and later Devuan > for years and recently testing ceph in rook, I would like to chime in to > some of the topics mentioned here with short review: > > Devuan/OS package:

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-18 Thread Sage Weil
Following up with some general comments on the main container downsides and on the upsides that led us down this path in the first place. Aside from a few minor misunderstandings, it seems like most of the objections to containers boil down to a few major points: > Containers are more complicated

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-18 Thread Sage Weil
On Wed, Jun 2, 2021 at 9:01 AM Daniel Baumann wrote: > > * Ceph users will benefit from both approaches being supported into the > > future > > this is rather important for us as well. > > we use systemd-nspawn based containers (that act and are managed like > traditional VMs, just without the ov

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-18 Thread Sage Weil
On Thu, Jun 3, 2021 at 2:18 AM Marc wrote: > Not using cephadm, I would also question other things like: > > - If it uses docker and docker daemon fails what happens to you containers? This is an obnoxious feature of docker; podman does not have this problem. > - I assume the ceph-osd containers

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-06-18 Thread Sage Weil
I'm arriving late to this thread, but a few things stood out that I wanted to clarify. On Wed, Jun 2, 2021 at 4:28 PM Oliver Freyermuth wrote: > To conclude, I strongly believe there's no one size fits all here. > > That was why I was hopeful when I first heard about the Ceph orchestrator > idea

[ceph-users] Re: Ceph 16.2.3 issues during upgrade from 15.2.10 with cephadm/lvm list

2021-05-10 Thread Sage Weil
The root cause is a bug in conmon. If you can upgrade to >= 2.0.26 this will also fix the problem. What version are you using? The kubic repos currently have 2.0.27. See https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable We'll make sure the next release has the verbosity

[ceph-users] Re: orch upgrade mgr starts too slow and is terminated?

2021-05-06 Thread Sage Weil
Hi! I hit the same issue. This was a bug in 16.2.0 that wasn't completely fixed, but I think we have it this time. Kicking of a 16.2.3 build now to resolve the problem. (Basically, sometimes docker calls the image docker.io/ceph/ceph:foo and somethings it's ceph/ceph:foo, and our attempt to nor

[ceph-users] Re: cephadm and ha service for rgw

2021-04-07 Thread Sage Weil
Hi Seba, The RGW HA mode is still buggy, and is getting reworked. I'm hoping we'll have it sorted by the .2 release or so. In the meantime, you can configure haproxy and/or keepalived yourself or use whatever other load balancer you'd like... s On Sat, Apr 3, 2021 at 9:39 PM Seba chanel wrote

[ceph-users] Re: cephadm/podman :: upgrade to pacific stuck

2021-04-07 Thread Sage Weil
You would normally tell cephadm to deploy another mgr with 'ceph orch apply mgr 2'. In this case, the default placement policy for mgrs is already either 2 or 3, though--the problem is that you only have 1 host in your cluster, and cephadm currently doesn't handle placing multiple mgrs on a single

[ceph-users] Re: cephadm upgrade to pacific

2021-04-07 Thread Sage Weil
Can you share the output of 'ceph log last cephadm'? I'm wondering if you are hitting https://tracker.ceph.com/issues/50114 Thanks! s On Mon, Apr 5, 2021 at 4:00 AM Peter Childs wrote: > > I am attempting to upgrade a Ceph Upgrade cluster that was deployed with > Octopus 15.2.8 and upgraded to

[ceph-users] Re: ceph orch update fails - got new digests

2021-04-03 Thread Sage Weil
I have a proposed fix for this here: https://github.com/ceph/ceph/pull/40577 Unfortunately, this won't help you until it is tested and merged and included in 16.2.1. If you'd like to finish your upgrade before then, you can upgrade to the pacific branch tip with ceph orch upgrade start quay.ce

[ceph-users] Re: ceph orch update fails - got new digests

2021-04-02 Thread Sage Weil
On Fri, Apr 2, 2021 at 12:08 PM Alexander Sporleder wrote: > > Hello Sage, thank you for your response! > > I had some problems updating 15.2.8 -> 15.2.9 but after updating Podman > to 3.0.1 and Ceph to 15.2.10 everything was fine again. > > Then I started the update 15.2.10 -> 16.2.0 and in the b

[ceph-users] Re: ceph orch update fails - got new digests

2021-04-02 Thread Sage Weil
I'm a bit confused by the log messages--I'm not sure why the target_digests aren't changing. Can you post the whole ceph-mgr.mon-a-02.tvcrfq.log? (ceph-post-file /var/log/ceph/*/ceph-mgr.mon-a-02.tvcrfq.log) Thanks! s ___ ceph-users mailing list -- cep

[ceph-users] Re: ceph orch update fails - got new digests

2021-04-02 Thread Sage Weil
Hi Alex, Thanks for the report! I've opened https://tracker.ceph.com/issues/50114. It looks like the target_digests check needs to check for overlap instead of equality. sage On Fri, Apr 2, 2021 at 4:04 AM Alexander Sporleder wrote: > > Hello Ceph user list! > > I tried to update Ceph 15.2.10

[ceph-users] Re: understanding orchestration and cephadm

2021-03-31 Thread Sage Weil
Hi Gary, It looks like everything you did is fine. I think the "problem" is that cephadm has/had some logic that tried to leave users with an odd number of monitors. I'm pretty sure this is why two of them were removed. This code has been removed in pacific, and should probably be backported to

Re: linux-next: new contact(s) for the ceph tree?

2020-05-08 Thread Sage Weil
Jeff Layton Thanks, Stephen! sage On Sat, 9 May 2020, Stephen Rothwell wrote: > Hi all, > > I noticed commit > > 3a5ccecd9af7 ("MAINTAINERS: remove myself as ceph co-maintainer") > > appear recently. So who should I now list as the contact(s) for the > ceph tree? > > -- > Cheers, > Ste

[ceph-users] Re: ceph cephadm generate-key => No such file or directory: '/tmp/tmp4ejhr7wh/key'

2020-03-30 Thread Sage Weil
On Mon, 30 Mar 2020, Ml Ml wrote: > Hello List, > > is this a bug? > > root@ceph02:~# ceph cephadm generate-key > Error EINVAL: Traceback (most recent call last): > File "/usr/share/ceph/mgr/cephadm/module.py", line 1413, in _generate_key > with open(path, 'r') as f: > FileNotFoundError: [E

[ceph-users] Leave of absence...

2020-03-27 Thread Sage Weil
Hi everyone, I am taking time off from the Ceph project and from Red Hat, starting in April and extending through the US election in November. I will initially be working with an organization focused on voter registration and turnout and combating voter suppression and disinformation campaigns.

[ceph-users] Re: v15.2.0 Octopus released

2020-03-27 Thread Sage Weil
One word of caution: there is one known upgrade issue if you - upgrade from luminous to nautilus, and then - run nautilus for a very short period of time (hours), and then - upgrade from nautilus to octopus that prevents OSDs from starting. We have a fix that will be in 15.2.1, but until tha

[ceph-users] Re: v15.2.0 Octopus released

2020-03-24 Thread Sage Weil
On Tue, 24 Mar 2020, konstantin.ilya...@mediascope.net wrote: > Is it poosible to provide instructions about upgrading from CentOs7+ > ceph 14.2.8 to CentOs8+ceph 15.2.0 ? You have ~2 options: - First, upgrade Ceph packages to 15.2.0. Note that your dashboard will break temporarily. Then, upg

[ceph-users] Q release name

2020-03-23 Thread Sage Weil
Hi everyone, As we wrap up Octopus and kick of development for Pacific, now it seems like a good idea to sort out what to call the Q release. Traditionally/historically, these have always been names of cephalopod species--usually the "common name", but occasionally a latin name (infernalis).

[ceph-users] Re: Error in Telemetry Module... again

2020-03-17 Thread Sage Weil
This is a known issue--it will be fixed in the next nautilus point release. On Tue, 17 Mar 2020, Tecnologia Charne.Net wrote: > Hello! > > I updated monitors to 14.2.8 and I have now: > > health: HEALTH_ERR >     Module 'telemetry' has failed: cannot concatenate 'str' and 'UUID' > obje

[ceph-users] Re: Disabling Telemetry

2020-03-07 Thread Sage Weil
On Sat, 7 Mar 2020, m...@silvenga.com wrote: > Is there another way to disable telemetry then using: > > > ceph telemetry off > > Error EIO: Module 'telemetry' has experienced an error and cannot handle > > commands: cannot concatenate 'str' and 'UUID' objects > > I'm attempting to get all my cl

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote: > > > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote: > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > On Thu, Mar 5, 2020 at 4:42 PM

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi Sage, > > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > > >

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi Sage, > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi all, > > > > > > There's something broken in our env when we try to add

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi all, > > There's something broken in our env when we try to add new mons to > existing clusters, confirmed on two clusters running mimic and > nautilus. It's basically this issue > https://tracker.ceph.com/issues/42830 > > In case something is wron

[ceph-users] Re: Octopus release announcement

2020-03-02 Thread Sage Weil
It's getting close. My guess is 1-2 weeks away. On Mon, 2 Mar 2020, Alex Chalkias wrote: > Hello, > > I was looking for an official announcement for Octopus release, as the > latest update (back in Q3/2019) on the subject said it was scheduled for > March 1st. > > Any updates on that? > > BR,

[ceph-users] Re: Cache tier OSDs crashing due to unfound hitset object 14.2.7

2020-02-27 Thread Sage Weil
If the pg in question can recover without that OSD, I would use use ceph-objectstore-tool to export and remove it, and then move on. I hit a similar issue on my system (due to a bunch in an early octopus build) and it was super tedious to fix up manually (needed patched code and manual modificat

[ceph-users] Re: Running cephadm as a nonroot user

2020-02-10 Thread Sage Weil
There is a 'packaged' mode that does this, but it's a bit different: - you have to install the cephadm package on each host - the package sets up a cephadm user and sudoers.d file - mgr/cephadm will ssh in as that user and sudo as needed The net is that you have to make sure cephadm is installed

[ceph-users] Re: [Ceph-community] HEALTH_WARN - daemons have recently crashed

2020-02-05 Thread Sage Weil
[Moving this to ceph-users@ceph.io] This looks like https://tracker.ceph.com/issues/43365, which *looks* like it is an issue with the standard libraries in ubuntu 18.04. One user said: "After upgrading our monitor Ubuntu 18.04 packages (apt-get upgrade) with the 5.3.0-26-generic kernel, it see

[ceph-users] Cephalocon Seoul is canceled

2020-02-04 Thread Sage Weil
Hi everyone, We are sorry to announce that, due to the recent coronavirus outbreak, we are canceling Cephalocon for March 3-5 in Seoul. More details will follow about how to best handle cancellation of hotel reservations and so forth. Registrations will of course be refunded--expect an email

[ceph-users] Re: No Activity?

2020-01-28 Thread Sage Weil
On Tue, 28 Jan 2020, dhils...@performair.com wrote: > All; > > I haven't had a single email come in from the ceph-users list at ceph.io > since 01/22/2020. > > Is there just that little traffic right now? I'm seeing 10-20 messages per day. Confirm your registration and/or check your filters?

[ceph-users] Cephalocon early-bird registration ends today

2020-01-21 Thread Sage Weil
Hi everyone, Quick reminder that the early-bird registration for Cephalocon Seoul (Mar 3-5) ends tonight! We also have the hotel booking link and code up on the site (finally--sorry for the delay). https://ceph.io/cephalocon/seoul-2020/ Hope to see you there! sage

[ceph-users] Re: High CPU usage by ceph-mgr in 14.2.5

2019-12-18 Thread Sage Weil
On Wed, 18 Dec 2019, Bryan Stillwell wrote: > After upgrading one of our clusters from Nautilus 14.2.2 to Nautilus 14.2.5 > I'm seeing 100% CPU usage by a single ceph-mgr thread (found using 'top -H'). > Attaching to the thread with strace shows a lot of mmap and munmap calls. > Here's the dis

[ceph-users] Re: Ceph on CentOS 8?

2019-12-13 Thread Sage Weil
On Fri, 13 Dec 2019, Manuel Lausch wrote: > Hi, > > I am interested in el8 Packages as well. > Is there any plan to provide el8 packages in the near future? Ceph Octopus will be based on CentOS 8. It's due out in March. The centos8 transition is awkward because our python 2 dependencies don't

[ceph-users] Re: Cephalocon 2020

2019-12-10 Thread Sage Weil
On Tue, 10 Dec 2019, Sage Weil wrote: > Hi everyone, > > The next Cephalocon is coming up on March 3-5 in Seoul! The CFP is open > until Friday (get your talks in!). We expect to have the program > ready for the first week of January. Registration (early bird) will be &

[ceph-users] Cephalocon 2020

2019-12-10 Thread Sage Weil
Hi everyone, The next Cephalocon is coming up on March 3-5 in Seoul! The CFP is open until Friday (get your talks in!). We expect to have the program ready for the first week of January. Registration (early bird) will be available soon. We're also looking for sponsors for the conference. T

[ceph-users] Re: v14.2.5 Nautilus released

2019-12-10 Thread Sage Weil
> > If you are not comfortable sharing device metrics, you can disable that > > channel first before re-opting-in: > > > > ceph config set mgr mgr/telemetry/channel_crash false > > This should be channel_device, right? Yep! https://github.com/ceph/ceph/pull/32148 Thanks, sage __

[ceph-users] v13.2.7 mimic released

2019-11-25 Thread Sage Weil
This is the seventh bugfix release of the Mimic v13.2.x long term stable release series. We recommend all Mimic users upgrade. For the full release notes, see https://ceph.io/releases/v13-2-7-mimic-released/ Notable Changes MDS: - Cache trimming is now throttled. Dropping the MDS cac

[ceph-users] Cephalocon 2020 will be March 4-5 in Seoul, South Korea!

2019-11-20 Thread Sage Weil
Hi everyone, We're pleased to announce that the next Cephalocon will be March 3-5 in Seoul, South Korea! https://ceph.com/cephalocon/seoul-2020/ The CFP for the conference is now open: https://linuxfoundation.smapply.io/prog/cephalocon_2020 Main conference: March 4-5 Developer

Re: [ceph-users] Revert a CephFS snapshot?

2019-11-14 Thread Sage Weil
On Thu, 14 Nov 2019, Patrick Donnelly wrote: > On Wed, Nov 13, 2019 at 6:36 PM Jerry Lee wrote: > > > > On Thu, 14 Nov 2019 at 07:07, Patrick Donnelly wrote: > > > > > > On Wed, Nov 13, 2019 at 2:30 AM Jerry Lee wrote: > > > > Recently, I'm evaluating the snpahsot feature of CephFS from kernel >

[ceph-users] Possible data corruption with 14.2.3 and 14.2.4

2019-11-14 Thread Sage Weil
Hi everyone, We've identified a data corruption bug[1], first introduced[2] (by yours truly) in 14.2.3 and affecting both 14.2.3 and 14.2.4. The corruption appears as a rocksdb checksum error or assertion that looks like os/bluestore/fastbmap_allocator_impl.h: 750: FAILED ceph_assert(available

[ceph-users] Re: Past_interval start interval mismatch (last_clean_epoch reported)

2019-11-10 Thread Sage Weil
On Sun, 10 Nov 2019, c...@elchaka.de wrote: > IIRC there is a ~history_ignore Option which could be Help in your Test > environment. This option is dangerous and can lead to data loss if used incorrectly. I suggest making backups of all PG instances with ceph-objectstore-tool before using it.

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-07 Thread Sage Weil
ow disk health monitoring is disabled) - > > but updating the mgrs alone should also be fine with us. I hope to > > have time for the experiment later today ;-). > > > > Cheers, > > Oliver > > > > Am 07.11.19 um 08:57 schrieb Thomas Schneider: > >> H

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-06 Thread Sage Weil
on again, > > and am waiting for them to become silent again. Let's hope the issue > > reappears before the disks run full of logs ;-). > > > > Cheers, > > Oliver > > > > Am 02.11.19 um 02:56 schrieb Sage Weil: > >> On Sat, 2 Nov 20

[ceph-users] Re: mgr daemons becoming unresponsive

2019-11-01 Thread Sage Weil
On Sat, 2 Nov 2019, Oliver Freyermuth wrote: > Dear Cephers, > > interestingly, after: > ceph device monitoring off > the mgrs seem to be stable now - the active one still went silent a few > minutes later, > but the standby took over and was stable, and restarting the broken one, it's > now st

[ceph-users] Re: subtrees have overcommitted (target_size_bytes / target_size_ratio)

2019-11-01 Thread Sage Weil
This was fixed a few weeks back. It should be resolved in 14.2.5. https://tracker.ceph.com/issues/41567 https://github.com/ceph/ceph/pull/31100 sage On Fri, 1 Nov 2019, Lars Täuber wrote: > Is there anybody who can explain the overcommitment calcuation? > > Thanks > > > Mon, 28 Oct 2019 11

[ceph-users] Re: 0B OSDs

2019-10-25 Thread Sage Weil
On Fri, 25 Oct 2019, dhils...@performair.com wrote: > All; > > We're setting up our second cluster, using version 14.2.4, and we've run into > a weird issue: all of our OSDs are created with a size of 0 B. Weights are > appropriate for the size of the underlying drives, but ceph -s shows this:

[ceph-users] Re: PG badly corrupted after merging PGs on mixed FileStore/BlueStore setup

2019-10-23 Thread Sage Weil
On Wed, 23 Oct 2019, Paul Emmerich wrote: > Hi, > > I'm working on a curious case that looks like a bug in PG merging > maybe related to FileStore. > > Setup is 14.2.1 that is half BlueStore half FileStore (being > migrated), and the number of PGs on an RGW index pool were reduced, > now one of t

[ceph-users] Re: Sick Nautilus cluster, OOM killing OSDs, lots of osdmaps

2019-10-09 Thread Sage Weil
[adding dev] On Wed, 9 Oct 2019, Aaron Johnson wrote: > Hi all > > I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4. > Monthly OS patching and reboots that go along with it have resulted in > the cluster getting very unwell. > > Many of the servers in the cluster are OOM-ki

[ceph-users] Re: Ceph and centos 8

2019-10-01 Thread Sage Weil
On Tue, 1 Oct 2019, f...@lpnhe.in2p3.fr wrote: > Hi, > We have a ceph+cephfs cluster runing nautilus version 14.2.4 > We have debian buster/ubuntu bionic clients mounting cephfs in kernel mode > without problems. > We now want to mount cephfs from our new centos 8 clients. Unfortunately, > ceph-c

[ceph-users] Re: Crush device class switchover

2019-09-30 Thread Sage Weil
On Mon, 30 Sep 2019, Reed Dier wrote: > I currently have two roots in my crush map, one for HDD devices and one for > SSD devices, and have had it that way since Jewel. > > I am currently on Nautilus, and have had my crush device classes for my OSD's > set since Luminous. > > > ID CLASS WEIGHT

[ceph-users] Re: Seemingly unbounded osd_snap keys in monstore. Normal? Expected?

2019-09-23 Thread Sage Weil
g a full record of past span deletions was changed, so we may need to make further improvements for octopus. Thanks! sage > > From: Sage Weil > Sent: Monday, September 23, 2019 9:41 AM > To: Koebbe, Brian > Cc: ceph-users@ceph.io ; d...@ceph.io > Subject: Re: [ce

[ceph-users] Re: Seemingly unbounded osd_snap keys in monstore. Normal? Expected?

2019-09-23 Thread Sage Weil
Hi, On Mon, 23 Sep 2019, Koebbe, Brian wrote: > Our cluster has a little over 100 RBDs. Each RBD is snapshotted with a > typical "frequently", hourly, daily, monthly type of schedule. > A while back a 4th monitor was temporarily added to the cluster that took > hours to synchronize with the oth

[ceph-users] Re: BlueStore _txc_add_transaction errors (possibly related to bug #38724)

2019-08-09 Thread Sage Weil
On Fri, 9 Aug 2019, Florian Haas wrote: > Hi everyone, > > it seems there have been several reports in the past related to > BlueStore OSDs crashing from unhandled errors in _txc_add_transaction: > > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-April/03.html > http://lists.ceph.co

[ceph-users] Re: Bluestore caching oddities, again

2019-08-07 Thread Sage Weil
On Thu, 8 Aug 2019, Christian Balzer wrote: > > Hello again, > > Getting back to this: > On Sun, 4 Aug 2019 10:47:27 +0900 Christian Balzer wrote: > > > Hello, > > > > preparing the first production bluestore, nautilus (latest) based cluster > > I've run into the same things other people and my

Re: [ceph-users] Nautilus 14.2.2 release announcement

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Alex Litvak wrote: > Dear Ceph developers, > > Please forgive me if this post offends anyone, but it would be nice if this > and all other releases would be announced before or shortly after they hit the > repos. Yep, my fault. Abhishek normally does this but he's out on vac

Re: [ceph-users] Legacy BlueStore stats reporting?

2019-07-19 Thread Sage Weil
On Fri, 19 Jul 2019, Stig Telfer wrote: > > On 19 Jul 2019, at 10:01, Konstantin Shalygin wrote: > >> Using Ceph-Ansible stable-4.0 I did a rolling update from latest Mimic to > >> Nautilus 14.2.2 on a cluster yesterday, and the update ran to completion > >> successfully. > >> > >> However, in

Re: [ceph-users] ceph mon crash - ceph mgr module ls -f plain

2019-07-17 Thread Sage Weil
Thanks, opened bug https://tracker.ceph.com/issues/40804. Fix should be trivial. sage On Wed, 17 Jul 2019, Oskar Malnowicz wrote: > Hello, > when i execute the following command on one of my three ceph-mon, all > ceph-mon crashes. > > ceph mgr module ls -f plain > >  ceph version 14.2.1 (d55

Re: [ceph-users] Changing the release cadence

2019-07-15 Thread Sage Weil
On Mon, 15 Jul 2019, Kaleb Keithley wrote: > On Mon, Jul 15, 2019 at 10:10 AM Sage Weil wrote: > > > On Mon, 15 Jul 2019, Kaleb Keithley wrote: > > > > > > If Octopus is really an LTS release like all the others, and you want > > > bleeding edge users to te

Re: [ceph-users] Changing the release cadence

2019-07-15 Thread Sage Weil
On Mon, 15 Jul 2019, Kaleb Keithley wrote: > On Wed, Jun 5, 2019 at 11:58 AM Sage Weil wrote: > > > ... > > > > This has mostly worked out well, except that the mimic release received > > less attention that we wanted due to the fact that multiple downstream >

[Bug 1835354] Re: disco: ceph-mgr unable to load crash module under py3

2019-07-12 Thread Sage Weil
https://github.com/ceph/ceph/pull/29029 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1835354 Title: disco: ceph-mgr unable to load crash module under py3 To manage notifications about this bug go

Re: [ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Sage Weil
On Fri, 12 Jul 2019, Nathan Fish wrote: > Thanks. Speaking of 14.2.2, is there a timeline for it? We really want > some of the fixes in it as soon as possible. I think it's basically ready now... probably Monday? sage > > On Fri, Jul 12, 2019 at 11:22 AM Sage Weil wrote: &g

[ceph-users] Pool stats issue with upgrades to nautilus

2019-07-12 Thread Sage Weil
Hi everyone, All current Nautilus releases have an issue where deploying a single new (Nautilus) BlueStore OSD on an upgraded cluster (i.e. one that was originally deployed pre-Nautilus) breaks the pool utilization stats reported by ``ceph df``. Until all OSDs have been reprovisioned or updat

Re: [ceph-users] How does monitor know OSD is dead?

2019-07-03 Thread Sage Weil
On Sun, 30 Jun 2019, Bryan Henderson wrote: > > I'm not sure why the monitor did not mark it _out_ after 600 seconds > > (default) > > Well, that part I understand. The monitor didn't mark the OSD out because the > monitor still considered the OSD up. No reason to mark an up OSD out. > > I thin

[ceph-users] Octopus release target: March 1 2020

2019-07-03 Thread Sage Weil
Hi everyone, The target release date for Octopus is March 1, 2020. The freeze will be January 1, 2020. As a practical matter, that means any features need to be in before people leave for the holidays, ensuring the features get in in time and also that we can run tests over the holidays while

Re: [RFC PATCH] ceph: initialize superblock s_time_gran to 1

2019-06-27 Thread Sage Weil
de was originally > merged. Was this an earlier limitation of ceph that is no longer > applicable? > > In any case, I see no need at all to keep this at 1000, so: As long as the encoded on-write time value is at ns resolution, I agree! No recollection of why I did this :( Reviewed-by: Sage Weil

[ceph-users] Tech Talk tomorrow: Intro to Ceph

2019-06-26 Thread Sage Weil
Hi everyone, Tomorrow's Ceph Tech Talk will be an updated "Intro to Ceph" talk by Sage Weil. This will be based on a newly refreshed set of slides and provide a high-level introduction to the overall Ceph architecture, RGW, RBD, and CephFS. Our plan is to follow-up later t

Re: [ceph-users] Changing the release cadence

2019-06-26 Thread Sage Weil
people out for vacations) right in the middle of the lead-up to the freeze. Thoughts? sage On Wed, 26 Jun 2019, Sage Weil wrote: > On Wed, 26 Jun 2019, Alfonso Martinez Hidalgo wrote: > > I think March is a good idea. > > Spring had a slight edge over fall in the twitter pol

Re: [ceph-users] Changing the release cadence

2019-06-26 Thread Sage Weil
mber. > > > > For example, Nautilus was set to release in February and we got it out > > late in late March (Almost April) > > > > Would love to see more of a discussion around solving the problem of > > releasing when we say we are going to - so that we can then choose > &

Re: [ceph-users] Changing the release cadence

2019-06-26 Thread Sage Weil
On Tue, 25 Jun 2019, Alfredo Deza wrote: > On Mon, Jun 17, 2019 at 4:09 PM David Turner wrote: > > > > This was a little long to respond with on Twitter, so I thought I'd share > > my thoughts here. I love the idea of a 12 month cadence. I like October > > because admins aren't upgrading product

Re: [ceph-users] Changing the release cadence

2019-06-17 Thread Sage Weil
On Wed, 5 Jun 2019, Sage Weil wrote: > That brings us to an important decision: what time of year should we > release? Once we pick the timing, we'll be releasing at that time *every > year* for each release (barring another schedule shift, which we want to > avoid), so let&#

Re: [ceph-users] mutable health warnings

2019-06-14 Thread Sage Weil
On Thu, 13 Jun 2019, Neha Ojha wrote: > Hi everyone, > > There has been some interest in a feature that helps users to mute > health warnings. There is a trello card[1] associated with it and > we've had some discussion[2] in the past in a CDM about it. In > general, we want to understand a few th

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Sage Weil
On Thu, 13 Jun 2019, Harald Staub wrote: > On 13.06.19 15:52, Sage Weil wrote: > > On Thu, 13 Jun 2019, Harald Staub wrote: > [...] > > I think that increasing the various suicide timeout options will allow > > it to stay up long enough to clean up the ginormous objects: &g

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Sage Weil
depending on the nature of the problem; I suggested new OSDs as import > target) > > Paul > > On Thu, Jun 13, 2019 at 3:52 PM Sage Weil wrote: > > > On Thu, 13 Jun 2019, Harald Staub wrote: > > > Idea received from Wido den Hollander: > > > bluestore

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-13 Thread Sage Weil
locked ios and so forth). (Side note that since you started the OSD read-write using the internal copy of rocksdb, don't forget that the external copy you extracted (/mnt/ceph/db?) is now stale!) sage > > Any opinions? > > Thanks! > Harry > > On 13.06.19 09:32, Ha

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Wed, 12 Jun 2019, Sage Weil wrote: > On Thu, 13 Jun 2019, Simon Leinen wrote: > > Sage Weil writes: > > >> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column > > >> families: [default] > > >> Unrecognized command: stats > > >

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Thu, 13 Jun 2019, Simon Leinen wrote: > Sage Weil writes: > >> 2019-06-12 23:40:43.555 7f724b27f0c0 1 rocksdb: do_open column families: > >> [default] > >> Unrecognized command: stats > >> ceph-kvstore-tool: /build/ceph-14.2.1/src/rocksdb/db/ver

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Wed, 12 Jun 2019, Simon Leinen wrote: > We hope that we can get some access to S3 bucket indexes back, possibly > by somehow dropping and re-creating those indexes. Are all 3 OSDs crashing in the same way? My guess is that the reshard process triggered some massive rocksdb transaction that in

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Wed, 12 Jun 2019, Simon Leinen wrote: > Sage Weil writes: > > What happens if you do > > > ceph-kvstore-tool rocksdb /mnt/ceph/db stats > > (I'm afraid that our ceph-kvstore-tool doesn't know about a "stats" > command; but it still trie

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Wed, 12 Jun 2019, Simon Leinen wrote: > Dear Sage, > > > Also, can you try ceph-bluestore-tool bluefs-export on this osd? I'm > > pretty sure it'll crash in the same spot, but just want to confirm > > it's a bluefs issue. > > To my surprise, this actually seems to have worked: > > $ time s

Re: [ceph-users] rocksdb corruption, stale pg, rebuild bucket index

2019-06-12 Thread Sage Weil
On Wed, 12 Jun 2019, Harald Staub wrote: > On 12.06.19 17:40, Sage Weil wrote: > > On Wed, 12 Jun 2019, Harald Staub wrote: > > > Also opened an issue about the rocksdb problem: > > > https://tracker.ceph.com/issues/40300 > > > > Thanks! > > > &g

  1   2   3   4   5   6   7   8   9   10   >