Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread Tarek Zegar
It's not just mimic to nautilus I confirmed with luminous to mimic   They are checking for clean pgs with flags set, they should unset flags, then check. Set flags again, move on to next osd   - Original message -From: solarflow99 To: Tarek Zegar Cc: Ceph Users Subject: [EXTERNAL] Re:

Re: [ceph-users] ceph-volume ignores cluster name?

2019-05-13 Thread Alfredo Deza
On Mon, May 13, 2019 at 6:56 PM wrote: > > All; > > I'm working on spinning up a demonstration cluster using ceph, and yes, I'm > installing it manually, for the purpose of learning. > > I can't seem to correctly create an OSD, as ceph-volume seems to only work if > the cluster name is the

[ceph-users] ceph-volume ignores cluster name?

2019-05-13 Thread DHilsbos
All; I'm working on spinning up a demonstration cluster using ceph, and yes, I'm installing it manually, for the purpose of learning. I can't seem to correctly create an OSD, as ceph-volume seems to only work if the cluster name is the default. If I rename my configuration file (at

Re: [ceph-users] Rolling upgrade fails with flag norebalance with background IO

2019-05-13 Thread solarflow99
Are you sure can you really use 3.2 for nautilus? On Fri, May 10, 2019 at 7:23 AM Tarek Zegar wrote: > Ceph-ansible 3.2, rolling upgrade mimic -> nautilus. The ansible file sets > flag "norebalance". When there is*no* I/O to the cluster, upgrade works > fine. When upgrading with IO running in

Re: [ceph-users] Major ceph disaster

2019-05-13 Thread Dan van der Ster
Presumably the 2 OSDs you marked as lost were hosting those incomplete PGs? It would be useful to double confirm that: check with `ceph pg query` and `ceph pg dump`. (If so, this is why the ignore history les thing isn't helping; you don't have the minimum 3 stripes up for those 3+1 PGs.) If

Re: [ceph-users] Major ceph disaster

2019-05-13 Thread Lionel Bouton
Le 13/05/2019 à 16:20, Kevin Flöh a écrit : > Dear ceph experts, > > [...] We have 4 nodes with 24 osds each and use 3+1 erasure coding. [...] > Here is what happened: One osd daemon could not be started and > therefore we decided to mark the osd as lost and set it up from > scratch. Ceph started

[ceph-users] Ceph MGR CRASH : balancer module

2019-05-13 Thread Tarek Zegar
Hello, My manager keeps dying, the last meta log is below. What is causing this? I do have two roots in the osd tree with shared hosts(see below), I can't imagine that is causing balancer to fail? meta log: { "crash_id": "2019-05-11_19:09:17.999875Z_aa7afa7c-bc7e-43ec-b32a-821bd47bd68b",

[ceph-users] mimic: MDS standby-replay causing blocked ops (MDS bug?)

2019-05-13 Thread Frank Schilder
Short story: We have a new HPC installation with file systems provided by cephfs (home, apps, ...). We have one cephfs and all client file systems are sub-directory mounts. On this ceph file system, we have a bit more than 500 nodes with currently 2 ceph fs mounts each, resulting

[ceph-users] Ceph Health 14.2.1 Dont report slow OPS

2019-05-13 Thread EDH - Manuel Rios Fernandez
Hi The lastest versión of ceph is not reporting anymore slowops in dashboard and cli? Bug? Or expected? ceph version 14.2.1 (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable) Linux 3.10.0-957.12.1.el7.x86_64 #1 SMP Mon Apr 29 14:59:59 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Re: [ceph-users] Post-mortem analisys?

2019-05-13 Thread Marco Gaiarin
Mandi! Martin Verges In chel di` si favelave... > first of all, hyperconverged setups with public accessable VMs could be > affected by DDoS attacks or other harmful issues that causes cascading errors > in your infrastructure. No, private cluster. > Are you sure your network worked

[ceph-users] Major ceph disaster

2019-05-13 Thread Kevin Flöh
Dear ceph experts, we have several (maybe related) problems with our ceph cluster, let me first show you the current ceph status:   cluster:     id: 23e72372-0d44-4cad-b24f-3641b14b86f4     health: HEALTH_ERR     1 MDSs report slow metadata IOs     1 MDSs report slow

Re: [ceph-users] radosgw index all keys in all buckets [EXT]

2019-05-13 Thread Matthew Vernon
Hi, On 02/05/2019 22:00, Aaron Bassett wrote: > With these caps I'm able to use a python radosgw-admin lib to list > buckets and acls and users, but not keys. This user is also unable to > read buckets and/or keys through the normal s3 api. Is there a way to > create an s3 user that has read

Re: [ceph-users] Post-mortem analisys?

2019-05-13 Thread Martin Verges
Hello Marco, first of all, hyperconverged setups with public accessable VMs could be affected by DDoS attacks or other harmful issues that causes cascading errors in your infrastructure. Are you sure your network worked correctly at the time? -- Martin Verges Managing director Mobile: +49 174

[ceph-users] Post-mortem analisys?

2019-05-13 Thread Marco Gaiarin
[I't is not really a 'mortem', but...] Saturday afternoon, my 3-nodes proxmox ceph cluster have a big 'slowdown', that started at 12:35:24 with some OOM condition in one of the 3 storage nodes, followed with also OOM on another node, at 12:43:31. After that, all bad things happens: stuck

Re: [ceph-users] Slow requests from bluestore osds

2019-05-13 Thread Marc Schöchlin
Hi Manuel, hello list, what is the reason for manually compacting the osd? In past debugging sessions we saw that slow requests appeared in correlation to compaction messages of the involved osds. In this situation we haven't seen this (see logfile below in my last mail). As described our